thr3ads.net - zfs discuss - [zfs-discuss] A versioning FS [Oct 2006]

If this information is useful, please help other people find it:
Share via:

Jeremy Teo

2006-Oct-05 14:18 UTC

[zfs-discuss] A versioning FS

What would a version FS buy us that cron+ zfs snapshots doesn''t?

-- 
Regards,
Jeremy

Brian Hechinger

2006-Oct-05 14:24 UTC

head link

[zfs-discuss] A versioning FS

On Thu, Oct 05, 2006 at 10:18:18PM +0800, Jeremy Teo
wrote:> What would a version FS buy us that cron+ zfs snapshots doesn''t?
Instant file copy.  with cron you could make multiple changes between
snapshot runs.

-brian

David Dyer-Bennet

2006-Oct-05 18:19 UTC

head link

[zfs-discuss] A versioning FS

On 10/5/06, Jeremy Teo <white.wristband at gmail.com>
wrote:> What would a version FS buy us that cron+ zfs snapshots doesn''t?
Finer granularity; no chance of missing a change.

TOPS-20 did this, and it was *tremendously* useful . Snapshots, source
control, and other alternatives aren''t, in fact, alternatives.
They''re useful in and of themselves, very useful indeed, but they
don''t address the same needs as versioning.
-- 
David Dyer-Bennet, <mailto:dd-b at dd-b.net>,
<http://www.dd-b.net/dd-b/>
RKBA: <http://www.dd-b.net/carry/>
Pics: <http://www.dd-b.net/dd-b/SnapshotAlbum/>
Dragaera/Steven Brust: <http://dragaera.info/>

Brian Hechinger

2006-Oct-05 21:01 UTC

head link

[zfs-discuss] A versioning FS

On Thu, Oct 05, 2006 at 11:19:19AM -0700, David Dyer-Bennet
wrote:> On 10/5/06, Jeremy Teo <white.wristband at gmail.com> wrote:
> >What would a version FS buy us that cron+ zfs snapshots
doesn''t?
> 
> Finer granularity; no chance of missing a change.
> 
> TOPS-20 did this, and it was *tremendously* useful . Snapshots, source
> control, and other alternatives aren''t, in fact, alternatives.
> They''re useful in and of themselves, very useful indeed, but they
> don''t address the same needs as versioning.
VMS _still_ does this, and it''s one of my favorite features of the OS.

-brian

Erik Trimble

2006-Oct-05 21:19 UTC

head link

[zfs-discuss] A versioning FS

Brian Hechinger wrote:> On Thu, Oct 05, 2006 at 11:19:19AM -0700, David Dyer-Bennet wrote:
>   
>> On 10/5/06, Jeremy Teo <white.wristband at gmail.com> wrote:
>>     
>>> What would a version FS buy us that cron+ zfs snapshots
doesn''t?
>>>       
>> Finer granularity; no chance of missing a change.
>>
>> TOPS-20 did this, and it was *tremendously* useful . Snapshots, source
>> control, and other alternatives aren''t, in fact, alternatives.
>> They''re useful in and of themselves, very useful indeed, but
they
>> don''t address the same needs as versioning.
>>     
>
> VMS _still_ does this, and it''s one of my favorite features of the
OS.
>
> -brian
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>   I too remember VMS and the FS versioning feature.

Doing versioning at the file-system layer allows block-level changes to 
be stored, so it doesn''t consume enormous amounts of extra space. In 
fact, it''s more efficient than any versioning software (CVS, SVN, 
teamware, etc) for storing versions.

However, there are three BIG drawbacks for using versioning in your FS 
(that assumes that it is a tunable parameter and can be turned off for a 
FS when not desired):

(1)  File listing symantics become a bit of a mess.  VMS stores versions 
as <filename>;<version>    That is, it uses the semi-colon as a 
divider.  Now, I''m not at all sure how we can make ZFS POSIX-compliant 
and still do something like this.  Versioning filesystems tend to be a 
complete mess - it is hard to present usable information about which 
versions are available, and at the same time keep things clean. Even 
keeping versions in a hidden dir (say .zfs_versions) in each directory 
still leaves that directory filled with a huge mess of files.

(2)  File Versioning is no replacement for source code control, as you 
miss all the extra features (tagging, branching, comments, etc) that go 
with a file version check-in.

(3)  Many apps continuously save either temp copies or actual copies of 
the file you are working on. This leads to a version explosion, where 
you end up with 100s of versions of a commonly used file.  This tends to 
be worse than useless, as people have an incredibly hard time figuring 
out which (older) version they might actually want to look at.  And, 
this problem ISN''T ever going to go away, as it would require apps to 
understand filesystem features for ZFS, which isn''t going to happen.

I''d discourage File Versioning at this late stage in UNIX.  Source Code
control systems fulfill the need for serious uses, and casual usage is 
obviated by the mantra of "save early, save often" that has been
beaten
into the userbase. Trying to change that is a recipe for disaster.

Maybe when we change filesystems to a DB, we can look at automatic 
versioning again, as a DB can mitigate #1 and #3 issues above, and can 
actually implement #2 completely.   OracleFS, here I come. (<groan>)

-Erik

Richard Elling - PAE

2006-Oct-05 21:20 UTC

head link

[zfs-discuss] A versioning FS

Brian Hechinger wrote:> On Thu, Oct 05, 2006 at 11:19:19AM -0700, David Dyer-Bennet wrote:
>> On 10/5/06, Jeremy Teo <white.wristband at gmail.com> wrote:
>>> What would a version FS buy us that cron+ zfs snapshots
doesn''t?
>> Finer granularity; no chance of missing a change.
>>
>> TOPS-20 did this, and it was *tremendously* useful . Snapshots, source
>> control, and other alternatives aren''t, in fact, alternatives.
>> They''re useful in and of themselves, very useful indeed, but
they
>> don''t address the same needs as versioning.
> 
> VMS _still_ does this, and it''s one of my favorite features of the
OS.
It is a real PITA if you are unfortunate enough to use quotas :-(
  -- richard

Casper.Dik at Sun.COM

2006-Oct-05 22:25 UTC

head link

[zfs-discuss] A versioning FS

>Brian Hechinger wrote:
>> On Thu, Oct 05, 2006 at 11:19:19AM -0700, David Dyer-Bennet wrote:
>>> On 10/5/06, Jeremy Teo <white.wristband at gmail.com> wrote:
>>>> What would a version FS buy us that cron+ zfs snapshots
doesn''t?
>>> Finer granularity; no chance of missing a change.
>>>
>>> TOPS-20 did this, and it was *tremendously* useful . Snapshots,
source
>>> control, and other alternatives aren''t, in fact,
alternatives.
>>> They''re useful in and of themselves, very useful indeed,
but they
>>> don''t address the same needs as versioning.
>> 
>> VMS _still_ does this, and it''s one of my favorite features of
the OS.
>
>It is a real PITA if you are unfortunate enough to use quotas :-(
It''s one of the things I hated about VMS; so I quickly wrote a script
which on logout purged all extra copies and renamed all files back to
*;1.

Casper

David Dyer-Bennet

2006-Oct-05 23:08 UTC

head link

[zfs-discuss] A versioning FS

On 10/5/06, Erik Trimble <Erik.Trimble at sun.com> wrote:
> Doing versioning at the file-system layer allows block-level changes to
> be stored, so it doesn''t consume enormous amounts of extra space.
In
> fact, it''s more efficient than any versioning software (CVS, SVN,
> teamware, etc) for storing versions.
Comparing to cvs/svn misses the point; as I said, they address
comletely different needs.
> However, there are three BIG drawbacks for using versioning in your FS
> (that assumes that it is a tunable parameter and can be turned off for a
> FS when not desired):
>
> (1)  File listing symantics become a bit of a mess.  VMS stores versions
> as <filename>;<version>    That is, it uses the semi-colon as a
> divider.  Now, I''m not at all sure how we can make ZFS
POSIX-compliant
> and still do something like this.  Versioning filesystems tend to be a
> complete mess - it is hard to present usable information about which
> versions are available, and at the same time keep things clean. Even
> keeping versions in a hidden dir (say .zfs_versions) in each directory
> still leaves that directory filled with a huge mess of files.
"Complete mess" is certainly not my experience (I worked with TOPS-20
from 1977 to 1985 and VMS from 1979 to 1985).  The key is that you
need to *clean up*; specifically, you need to use the command which
deletes all but the most recent copy of each file in a directory at
the end of pretty much each work session.

It''s trivial to present information on which versions are available;
you simply list each one as a file, which has the date info any file
has, and the version number.
> (2)  File Versioning is no replacement for source code control, as you
> miss all the extra features (tagging, branching, comments, etc) that go
> with a file version check-in.
It''s very definitely not an alternative or replacement for source code
control, no.  It provides a very useful feature to use *alongside*
source control.  Source code control is also not a replacement for
file versioning (I end up creating spare copies of files with funny
names for things I''d otherwise get from versioning; and I end up
losing time through not having through to create such a file, whereas
versioning is automatic).
> (3)  Many apps continuously save either temp copies or actual copies of
> the file you are working on. This leads to a version explosion, where
> you end up with 100s of versions of a commonly used file.  This tends to
> be worse than useless, as people have an incredibly hard time figuring
> out which (older) version they might actually want to look at.  And,
> this problem ISN''T ever going to go away, as it would require apps
to
> understand filesystem features for ZFS, which isn''t going to
happen.
Files treated that way are often deleted at the end of the session
automatically, so no problem there.  Or else they''ll be cleaned up
when you do your session-end cleanup.  What the heck was that command
on TOPS-20 anyway?  Maybe "purge"?  Sorry, 20-year-old memories are
fuzzy on some details.

File versioning worked a lot better on TOPS-20 than on VMS, as I
remember it.  The facility looked the same, but actually working with
it was much cleaner and easier.

Making it somewhat controllable would be useful.  Starting with maybe
an inheritable default, so some directory trees could be set not to
version.
> I''d discourage File Versioning at this late stage in UNIX.  Source
Code
> control systems fulfill the need for serious uses, and casual usage is
> obviated by the mantra of "save early, save often" that has been
beaten
> into the userbase. Trying to change that is a recipe for disaster.
Actually, "save early and often" is exactly why versioning is
important.  If you discover you''ve gone down a blind alley in some
code, it makes it easy to get back to the earlier spots.  This, in my
experience, happens at a detail level where you won''t (in fact
can''t)
be doing checkins to version control.
-- 
David Dyer-Bennet, <mailto:dd-b at dd-b.net>,
<http://www.dd-b.net/dd-b/>
RKBA: <http://www.dd-b.net/carry/>
Pics: <http://www.dd-b.net/dd-b/SnapshotAlbum/>
Dragaera/Steven Brust: <http://dragaera.info/>

Brian Hechinger

2006-Oct-05 23:13 UTC

head link

[zfs-discuss] A versioning FS

On Thu, Oct 05, 2006 at 04:08:13PM -0700, David Dyer-Bennet
wrote:>
> when you do your session-end cleanup.  What the heck was that command
> on TOPS-20 anyway?  Maybe "purge"?  Sorry, 20-year-old memories
are
> fuzzy on some details.
It''s PURGE under VMS, so knowing DEC, it was named PURGE under TOPS-20
as well.

Hmmmm, gotta get the DECsystem-2020 powered up one of these days.

-brian

Erik Trimble

2006-Oct-05 23:40 UTC

head link

[zfs-discuss] A versioning FS

On Thu, 2006-10-05 at 16:08 -0700, David Dyer-Bennet
wrote:> On 10/5/06, Erik Trimble <Erik.Trimble at sun.com> wrote:
> 
> > Doing versioning at the file-system layer allows block-level changes
to
> > be stored, so it doesn''t consume enormous amounts of extra
space. In
> > fact, it''s more efficient than any versioning software (CVS,
SVN,
> > teamware, etc) for storing versions.
> 
> Comparing to cvs/svn misses the point; as I said, they address
> comletely different needs.
> I was making a general point, to make it clear FS versioning isn''t a
disk pig.

> > However, there are three BIG drawbacks for using versioning in your FS
> > (that assumes that it is a tunable parameter and can be turned off for
a
> > FS when not desired):
> >
> > (1)  File listing symantics become a bit of a mess.  VMS stores
versions
> > as <filename>;<version>    That is, it uses the semi-colon
as a
> > divider.  Now, I''m not at all sure how we can make ZFS
POSIX-compliant
> > and still do something like this.  Versioning filesystems tend to be a
> > complete mess - it is hard to present usable information about which
> > versions are available, and at the same time keep things clean. Even
> > keeping versions in a hidden dir (say .zfs_versions) in each directory
> > still leaves that directory filled with a huge mess of files.
> 
> "Complete mess" is certainly not my experience (I worked with
TOPS-20
> from 1977 to 1985 and VMS from 1979 to 1985).  The key is that you
> need to *clean up*; specifically, you need to use the command which
> deletes all but the most recent copy of each file in a directory at
> the end of pretty much each work session.
> 
> It''s trivial to present information on which versions are
available;
> you simply list each one as a file, which has the date info any file
> has, and the version number.
> 
I stand by the "complete mess" statement. _You_ have trained yourself
to
get around the problem, by eliminating most of the reason for file
versioning - you delete everything when you log out.  A normal user (or
even, most scripts) aren''t going to do this. Indeed, I would argue that
it makes no sense to implement versioning if all you are going to use it
for is on a per-session basis. 

And, try thinking of a directory with a few dozen files in it, each with
a dozen or more versions. that''s hideous, from a normal user
standpoint.
VMS''s implementation of <filename>;<version> is completely
unwieldy if
you have more than a few files, or more than a few versions. And, in
modern typical use, it is _highly_ likely both will be true. 

> > (2)  File Versioning is no replacement for source code control, as you
> > miss all the extra features (tagging, branching, comments, etc) that
go
> > with a file version check-in.
> 
> It''s very definitely not an alternative or replacement for source
code
> control, no.  It provides a very useful feature to use *alongside*
> source control.  Source code control is also not a replacement for
> file versioning (I end up creating spare copies of files with funny
> names for things I''d otherwise get from versioning; and I end up
> losing time through not having through to create such a file, whereas
> versioning is automatic).
File versioning would certainly be nice in many cases, but I think it''s
better implemented in the application (think of Photoshop''s unlimited
undo feature, though better than that), than in the FS, where it creates
a whole lot of clutter and confusion real fast, where it is only
specifically useful for a very limited selection of files.

> > (3)  Many apps continuously save either temp copies or actual copies
of
> > the file you are working on. This leads to a version explosion, where
> > you end up with 100s of versions of a commonly used file.  This tends
to
> > be worse than useless, as people have an incredibly hard time figuring
> > out which (older) version they might actually want to look at.  And,
> > this problem ISN''T ever going to go away, as it would require
apps to
> > understand filesystem features for ZFS, which isn''t going to
happen.
> 
> Files treated that way are often deleted at the end of the session
> automatically, so no problem there.  Or else they''ll be cleaned up
> when you do your session-end cleanup.  What the heck was that command
> on TOPS-20 anyway?  Maybe "purge"?  Sorry, 20-year-old memories
are
> fuzzy on some details.
So, here''s a question:  if I delete file X;1, do I delete X;x ?  That
is, do I delete all versions of a file when I delete the actual file?
what about deleting a (non-head) version?  And, exactly how many
different files have to be cleaned up when you logout?  How does this
get configured? Who does the configuring? What if I _want_ versions of
some files, but not the others?  

And, what about network-sharing?  For non-interactive use?  (i.e. via
SAMBA, or other apps where you''re not looking at the FS via a command
prompt?)
> File versioning worked a lot better on TOPS-20 than on VMS, as I
> remember it.  The facility looked the same, but actually working with
> it was much cleaner and easier.
> 
> Making it somewhat controllable would be useful.  Starting with maybe
> an inheritable default, so some directory trees could be set not to
> version.
> 
> > I''d discourage File Versioning at this late stage in UNIX. 
Source Code
> > control systems fulfill the need for serious uses, and casual usage is
> > obviated by the mantra of "save early, save often" that has
been beaten
> > into the userbase. Trying to change that is a recipe for disaster.
> 
> Actually, "save early and often" is exactly why versioning is
> important.  If you discover you''ve gone down a blind alley in some
> code, it makes it easy to get back to the earlier spots.  This, in my
> experience, happens at a detail level where you won''t (in fact
can''t)
> be doing checkins to version control.
Then, IMHO, you aren''t using VC properly.  File Versioning should
NEVER,
EVER, EVER be used for anything around VC.  It might be useful for
places VC isn''t traditionally use (Office documents, small scripts,
etc.), but the example you provide is one which is easily solved by use
of frequent checkins to VC - indeed, that''s what VC is supposed to be
for.



File versioning is really only useful when we can hide the versioning
mess from the end-user, and yet provide them with some reasonable
mechanism for accessing the file versions if need be. And we keep
versions around, period. I don''t see that as being possible using the
traditional UNIX/POSIX filesystem layout.  Like I said before, maybe
when the FS becomes a RDBMS, but even then...



-- 
Erik Trimble
Java System Support
Mailstop:  usca14-102
Phone:  x17195
Santa Clara, CA
Timezone: US/Pacific (GMT-0800)

Brian Hechinger

2006-Oct-05 23:45 UTC

head link

[zfs-discuss] A versioning FS

On Thu, Oct 05, 2006 at 04:40:09PM -0700, Erik Trimble
wrote:> 
> So, here''s a question:  if I delete file X;1, do I delete X;x ? 
That
> is, do I delete all versions of a file when I delete the actual file?
> what about deleting a (non-head) version?  And, exactly how many
Under VMS at least, that is entirely up to you, you can delete X;1, X;2 or
X;* if you so desire.
> different files have to be cleaned up when you logout?  How does this
> get configured? Who does the configuring? What if I _want_ versions of
> some files, but not the others?  
That is where it gets tricky.  Under the DEC !UNIX OSes, file versioning was
just a way of life since it was on all the time for everyone, period.  Trying
to apply that to UNIX, where file versionioning previously didn''t
exist?  Not
so easy. ;)
> And, what about network-sharing?  For non-interactive use?  (i.e. via
> SAMBA, or other apps where you''re not looking at the FS via a
command
> prompt?)
A way to not allow those access to the versioning system sounds reasonable.
> File versioning is really only useful when we can hide the versioning
> mess from the end-user, and yet provide them with some reasonable
> mechanism for accessing the file versions if need be. And we keep
> versions around, period. I don''t see that as being possible using
the
> traditional UNIX/POSIX filesystem layout.  Like I said before, maybe
> when the FS becomes a RDBMS, but even then...
The way digital did it is spot on, however, the use of ; is a problem once
you apply UNIX/POSIX filesystem requirements to it.  It may not work.

On the other hand ODS *is* an RDBMS really, so................. ;)

-brian

David Dyer-Bennet

2006-Oct-06 00:25 UTC

head link

[zfs-discuss] A versioning FS

A lot of this we''re clearly not going to agree on and I''ve
said what I
had to contribute.  There''s one remaining point, though...

On 10/5/06, Erik Trimble <Erik.Trimble at sun.com>
wrote:> On Thu, 2006-10-05 at 16:08 -0700, David Dyer-Bennet wrote:
> > Actually, "save early and often" is exactly why versioning
is
> > important.  If you discover you''ve gone down a blind alley in
some
> > code, it makes it easy to get back to the earlier spots.  This, in my
> > experience, happens at a detail level where you won''t (in
fact can''t)
> > be doing checkins to version control.
>
> Then, IMHO, you aren''t using VC properly.  File Versioning should
NEVER,
> EVER, EVER be used for anything around VC.  It might be useful for
> places VC isn''t traditionally use (Office documents, small
scripts,
> etc.), but the example you provide is one which is easily solved by use
> of frequent checkins to VC - indeed, that''s what VC is supposed to
be
> for.
No, any sane VC protocol must specifically forbid the checkin of the
stuff I want versioning (or file copies or whatever) for.  It''s
partial changes, probably doesn''t compile, nearly certainly
doesn''t
work.  This level of work product *cannot* be committed to the
repository.

Well, unless you have a better VCS than CVS or SVN.  I first met this
as an obscure, buggy, expensive, short-lived SUN product, actually; I
believe it was called NSE, the Network Software Engineering
environment.  And I used one commercial product (written by an NSE
user after NSE was discontinued) that supported the feature needed.
Both of these had what I might call a two-level VCS.  Each developer
had one or more private repositories (the way people have working
directories now with SVN), but you had full VCS checkin/checkout (and
compare and rollback and so forth) within that.  Then, when your code
was ready for the repository, you did a "commit" step that pushed it
up from your private repository to the public repository.

One of the big problems with CVS and SVN and Microsoft SourceSafe is
that you don''t have the benefits of version control most of the time,
because all commits are *public*.
-- 
David Dyer-Bennet, <mailto:dd-b at dd-b.net>,
<http://www.dd-b.net/dd-b/>
RKBA: <http://www.dd-b.net/carry/>
Pics: <http://www.dd-b.net/dd-b/SnapshotAlbum/>
Dragaera/Steven Brust: <http://dragaera.info/>

Erik Trimble

2006-Oct-06 00:38 UTC

head link

[zfs-discuss] A versioning FS

On Thu, 2006-10-05 at 17:25 -0700, David Dyer-Bennet
wrote:> 
> Well, unless you have a better VCS than CVS or SVN.  I first met this
> as an obscure, buggy, expensive, short-lived SUN product, actually; I
> believe it was called NSE, the Network Software Engineering
> environment.  And I used one commercial product (written by an NSE
> user after NSE was discontinued) that supported the feature needed.
> Both of these had what I might call a two-level VCS.  Each developer
> had one or more private repositories (the way people have working
> directories now with SVN), but you had full VCS checkin/checkout (and
> compare and rollback and so forth) within that.  Then, when your code
> was ready for the repository, you did a "commit" step that pushed
it
> up from your private repository to the public repository.
> 
> One of the big problems with CVS and SVN and Microsoft SourceSafe is
> that you don''t have the benefits of version control most of the
time,
> because all commits are *public*.
Just FYI:  that buggy, expensive, short-lived SUN product eventually
became "Teamware". 

Check out (no pun intended)  Mercurial and similar products, which have
similar behavior to Teamware - each developer has a "workspace" for
code, and you can do VC inside that workspace without having to do a
putback into the "main" tree.  That way, you do frequent VC checkins,
but don''t putback to the main tree until things actually work. Or, at
least, you _claim_ them to work. 

:-)




-- 
Erik Trimble
Java System Support
Mailstop:  usca14-102
Phone:  x17195
Santa Clara, CA
Timezone: US/Pacific (GMT-0800)

Wee Yeh Tan

2006-Oct-06 01:32 UTC

head link

[zfs-discuss] A versioning FS

On 10/6/06, David Dyer-Bennet <dd-b at dd-b.net>
wrote:> One of the big problems with CVS and SVN and Microsoft SourceSafe is
> that you don''t have the benefits of version control most of the
time,
> because all commits are *public*.
David,

That is exactly what "branch" is for in CVS and SVN.  Dunno much about
M$ SourceSafe.

-- 
Just me,
Wire ...

Chad Leigh -- Shire.Net LLC

2006-Oct-06 01:47 UTC

head link

[zfs-discuss] A versioning FS

On Oct 5, 2006, at 5:40 PM, Erik Trimble wrote:
> And, try thinking of a directory with a few dozen files in it, each  
> with
> a dozen or more versions. that''s hideous, from a normal user  
> standpoint.
> VMS''s implementation of <filename>;<version> is
completely unwieldy if
> you have more than a few files,
No it is not. I  worked for DEC and used VMS up through 1993 and  
never found it unwieldy.  Even if I had 100 versions of one file.  It is

1) what you are used to

2) what you are trained to do

that makes it unwieldy or not

I find the "unix" conventions of storying a file and file~ or any of  
the other myriad billion ways of doing it that each app has invented  
to be much more unwieldy.

Yes, you have to "purge" your directories once in a while. The same  
way you have to clean up any file "mess" you make on you computer  
(download area, desktop, etc).
> or more than a few versions. And, in
> modern typical use, it is _highly_ likely both will be true.
So what if you have more than a few versions of a file.

Beauty is in the eye of the beholder, and just because YOU find it  
unwieldy does not make it so for the general user or anyone else.

I would LOVE to have a VMS style (sorry, my TOPS-20 usage was very  
little so I have no remembrance of it there) file versioning built in  
to the system.

"save early, save often" ONLY makes sense with a file versioning  
system, or else you lose previous edits if you decide you have gone  
down a wrong alley.

Chad

---
Chad Leigh -- Shire.Net LLC
Your Web App and Email hosting provider
chad at shire.net

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2411 bytes
Desc: not available
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20061005/7413e5ff/attachment.bin>

Frank Cusack

2006-Oct-06 01:48 UTC

head link

[zfs-discuss] A versioning FS

On October 5, 2006 5:25:17 PM -0700 David Dyer-Bennet <dd-b at dd-b.net>
wrote:> Well, unless you have a better VCS than CVS or SVN.  I first met this
> as an obscure, buggy, expensive, short-lived SUN product, actually; I
> believe it was called NSE, the Network Software Engineering
> environment.  And I used one commercial product (written by an NSE
> user after NSE was discontinued) that supported the feature needed.
> Both of these had what I might call a two-level VCS.  Each developer
> had one or more private repositories (the way people have working
> directories now with SVN), but you had full VCS checkin/checkout (and
> compare and rollback and so forth) within that.  Then, when your code
> was ready for the repository, you did a "commit" step that pushed
it
> up from your private repository to the public repository.
I wouldn''t call that 2-level, it''s simply branching, and all
VCS/SCM
systems have this, even rcs.  Some expose all changes in the private
branch to everyone (modulo protection mechanisms), some only expose changes
that are "put back" (to use Sun teamware terminology).

Both CVS and SVN have this.

-frank

Chad Leigh -- Shire.Net LLC

2006-Oct-06 01:50 UTC

head link

[zfs-discuss] A versioning FS

On Oct 5, 2006, at 7:47 PM, Chad Leigh -- Shire.Net LLC wrote:
> I find the "unix" conventions of storying a file and file~ or any
> of the other myriad billion ways of doing it that each app has  
> invented to be much more unwieldy.

sorry,  "storing" a file, not "storying"

---
Chad Leigh -- Shire.Net LLC
Your Web App and Email hosting provider
chad at shire.net



-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2411 bytes
Desc: not available
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20061005/0e821b93/attachment.bin>

Chad Lewis

2006-Oct-06 02:02 UTC

head link

[zfs-discuss] A versioning FS

On Oct 5, 2006, at 6:48 PM, Frank Cusack wrote:
> On October 5, 2006 5:25:17 PM -0700 David Dyer-Bennet <dd-b at dd- 
> b.net> wrote:
>> Well, unless you have a better VCS than CVS or SVN.  I first met this
>> as an obscure, buggy, expensive, short-lived SUN product, actually; I
>> believe it was called NSE, the Network Software Engineering
>> environment.  And I used one commercial product (written by an NSE
>> user after NSE was discontinued) that supported the feature needed.
>> Both of these had what I might call a two-level VCS.  Each developer
>> had one or more private repositories (the way people have working
>> directories now with SVN), but you had full VCS checkin/checkout (and
>> compare and rollback and so forth) within that.  Then, when your code
>> was ready for the repository, you did a "commit" step that
pushed it
>> up from your private repository to the public repository.
>
> I wouldn''t call that 2-level, it''s simply branching, and
all VCS/SCM
> systems have this, even rcs.  Some expose all changes in the private
> branch to everyone (modulo protection mechanisms), some only expose  
> changes
> that are "put back" (to use Sun teamware terminology).
>
> Both CVS and SVN have this.
>
> -frank

David is describing a different behavior. Even a branch is still  
ultimately on the single,
master server with CVS, SVN, and more other versioning systems.  
Teamware, and a few
other versioning systems, let you have more arbitrary parent and  
child relationships.

In Teamware, you can create a project gate, have a variety of people  
check code into this
project gate, and do all of this without ever touching the parent  
gate. When the
project is done, you then checkin the changes to the project gate''s  
parent.

The gate parent may itself be a child of some other gate, making the  
above
project gate a grand-child of some higher gate. You can also change a  
child''s parent,
so you could in fact skip the parent and go straight to the "grand"  
parent if you wish.

For that matter, you can re-parent the "parent" to sync with the  
former child if you
had some reason to do so.

A Teamware putback really isn''t a matter of exposure. Until you do a  
putback to the
parent, the code is not physically (or even logically) present in the  
parent.

Teamware''s biggest drawbacks are a lack of change sets (like how  
Subversion tracks
simultaneous, individual changes as a group) and that it only runs  
via file access
(no network protocol, filesystem or NFS only.)

Mercurial seems to be similar to Teamware in terms of parenting, but  
with network protocol
support builtin. Which is presumably OpenSolaris will be using it.

ckl

Frank Cusack

2006-Oct-06 02:23 UTC

head link

[zfs-discuss] A versioning FS

On October 5, 2006 7:02:29 PM -0700 Chad Lewis <Chad.Lewis at Sun.COM>
wrote:>
> On Oct 5, 2006, at 6:48 PM, Frank Cusack wrote:
>
>> On October 5, 2006 5:25:17 PM -0700 David Dyer-Bennet <dd-b at dd-
>> b.net> wrote:
>>> Well, unless you have a better VCS than CVS or SVN.  I first met
this
>>> as an obscure, buggy, expensive, short-lived SUN product, actually;
I
>>> believe it was called NSE, the Network Software Engineering
>>> environment.  And I used one commercial product (written by an NSE
>>> user after NSE was discontinued) that supported the feature needed.
>>> Both of these had what I might call a two-level VCS.  Each
developer
>>> had one or more private repositories (the way people have working
>>> directories now with SVN), but you had full VCS checkin/checkout
(and
>>> compare and rollback and so forth) within that.  Then, when your
code
>>> was ready for the repository, you did a "commit" step
that pushed it
>>> up from your private repository to the public repository.
>>
>> I wouldn''t call that 2-level, it''s simply branching,
and all VCS/SCM
>> systems have this, even rcs.  Some expose all changes in the private
>> branch to everyone (modulo protection mechanisms), some only expose
>> changes
>> that are "put back" (to use Sun teamware terminology).
>>
>> Both CVS and SVN have this.
>>
>> -frank
>
>
> David is describing a different behavior. Even a branch is still 
ultimately on the single,
> master server with CVS, SVN, and more other versioning systems.  Teamware,
and a few
> other versioning systems, let you have more arbitrary parent and  child
relationships.
How are branches not arbitrary parent and child relationships?  (except
in cvs where branches pretty much suck but still it''s close)
> A Teamware putback really isn''t a matter of exposure. Until you do
a  putback to the
> parent, the code is not physically (or even logically) present in the 
parent.
That is what I meant by exposure -- whether or not "private" code is
available to others.  But how does that matter?

The difference between teamware (or git or bk or mercurial) and cvs (or
svn or p4) here is that everyone can see all private branches and everyone
can see each change in a private branch (again, modulo protections).
That doesn''t matter to the main branch.  The code is not in the main
branch logically (physically doesn''t matter) until you integrate or
putback.

My point is that having a private branch, where you can check in changes
to your heart''s content, and re-branch at will, and don''t have
to follow
"must compile" rules, can be handled by most any VCS.  Which is what
David was saying is needed for it to replace the functionality of a
versioned filesystem.

Some of them (eg p4) handle branching much better than others, making
this easier, but all of them can do it.

Wow, I''m surprised teamware doesn''t have changelists or a
similar concept.
Talk about stone ages. :-)

-frank

Michael Schuster

2006-Oct-06 07:07 UTC

head link

[zfs-discuss] A versioning FS

I seem to remember that one could configure the max. number of versions VMS 
would retain for you on a per-file basis - setting this to 1 would de facto 
turn off versioning.
IFF versioning were implemented in ZFS, AND was made configurable on a 
per-file basis (everything else wouldn''t make any sense at all, IMO),
the
default could be set to 1, to avoid the various horror scenarios that have 
been painted here, and people could increase the number of versions they want 
for those files that need it.

cheers
Michael

Chad Leigh -- Shire.Net LLC wrote:> 
> On Oct 5, 2006, at 5:40 PM, Erik Trimble wrote:
> 
>> And, try thinking of a directory with a few dozen files in it, each
with
>> a dozen or more versions. that''s hideous, from a normal user
standpoint.
>> VMS''s implementation of <filename>;<version> is
completely unwieldy if
>> you have more than a few files,
> 
> No it is not. I  worked for DEC and used VMS up through 1993 and never 
> found it unwieldy.  Even if I had 100 versions of one file.  It is
> 
> 1) what you are used to
> 
> 2) what you are trained to do
> 
> that makes it unwieldy or not
> 
> I find the "unix" conventions of storying a file and file~ or any
of the
> other myriad billion ways of doing it that each app has invented to be 
> much more unwieldy.
> 
> Yes, you have to "purge" your directories once in a while. The
same way
> you have to clean up any file "mess" you make on you computer
(download
> area, desktop, etc).
> 
>> or more than a few versions. And, in
>> modern typical use, it is _highly_ likely both will be true.
> 
> So what if you have more than a few versions of a file.
> 
> Beauty is in the eye of the beholder, and just because YOU find it 
> unwieldy does not make it so for the general user or anyone else.
> 
> I would LOVE to have a VMS style (sorry, my TOPS-20 usage was very 
> little so I have no remembrance of it there) file versioning built in to 
> the system.
> 
> "save early, save often" ONLY makes sense with a file versioning
system,
> or else you lose previous edits if you decide you have gone down a wrong 
> alley.
> 
> Chad
> 
> ---
> Chad Leigh -- Shire.Net LLC
> Your Web App and Email hosting provider
> chad at shire.net
> 
> 
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

-- 
Michael Schuster                  +49 89 46008-2974 / x62974
Recursion, n.: see ''Recursion''

Richard L. Hamilton

2006-Oct-06 07:07 UTC

head link

[zfs-discuss] Re: A versioning FS

> What would a version FS buy us that cron+ zfs
> snapshots doesn''t?
Some people are making money on the concept, so I
suppose there are those who perceive benefits:

http://en.wikipedia.org/wiki/Rational_ClearCase

(I dimly remember DSEE on the Apollos; also some sort of
versioning file type on (probably long-dead) Harris VOS
real-time OS.)
 
 
This message posted from opensolaris.org

Chad Leigh -- Shire.Net LLC

2006-Oct-06 07:14 UTC

head link

[zfs-discuss] A versioning FS

On Oct 6, 2006, at 1:07 AM, Michael Schuster wrote:
> I seem to remember that one could configure the max. number of  
> versions VMS would retain for you on a per-file basis - setting  
> this to 1 would de facto turn off versioning.
> IFF versioning were implemented in ZFS, AND was made configurable  
> on a per-file basis (everything else wouldn''t make any sense at  
> all, IMO), the default could be set to 1, to avoid the various  
> horror scenarios that have been painted here, and people could  
> increase the number of versions they want for those files that need  
> it.
Yes, it was configurable.   I don''t remember if it was per file or  
per directory but per file would make sense.

It would have to fit into a "unix" way of things so the top most  
version (most recent) would have to have the "plain" name so that it  
would work with standard unix apps that expect certain names...

I am not one to expound on how that would be done or details as I am  
not by any means a guru of "low level unix-style things.

But I would dearly like to have a versioning capability.

Best
Chad
>
> cheers
> Michael
>
> Chad Leigh -- Shire.Net LLC wrote:
>> On Oct 5, 2006, at 5:40 PM, Erik Trimble wrote:
>>> And, try thinking of a directory with a few dozen files in it,  
>>> each with
>>> a dozen or more versions. that''s hideous, from a normal
user
>>> standpoint.
>>> VMS''s implementation of <filename>;<version>
is completely
>>> unwieldy if
>>> you have more than a few files,
>> No it is not. I  worked for DEC and used VMS up through 1993 and  
>> never found it unwieldy.  Even if I had 100 versions of one file.   
>> It is
>> 1) what you are used to
>> 2) what you are trained to do
>> that makes it unwieldy or not
>> I find the "unix" conventions of storying a file and file~ or
any
>> of the other myriad billion ways of doing it that each app has  
>> invented to be much more unwieldy.
>> Yes, you have to "purge" your directories once in a while.
The
>> same way you have to clean up any file "mess" you make on you
>> computer (download area, desktop, etc).
>>> or more than a few versions. And, in
>>> modern typical use, it is _highly_ likely both will be true.
>> So what if you have more than a few versions of a file.
>> Beauty is in the eye of the beholder, and just because YOU find it  
>> unwieldy does not make it so for the general user or anyone else.
>> I would LOVE to have a VMS style (sorry, my TOPS-20 usage was very  
>> little so I have no remembrance of it there) file versioning built  
>> in to the system.
>> "save early, save often" ONLY makes sense with a file
versioning
>> system, or else you lose previous edits if you decide you have  
>> gone down a wrong alley.
>> Chad
>> ---
>> Chad Leigh -- Shire.Net LLC
>> Your Web App and Email hosting provider
>> chad at shire.net
>> --------------------------------------------------------------------- 
>> ---
>> _______________________________________________
>> zfs-discuss mailing list
>> zfs-discuss at opensolaris.org
>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
>
> -- 
> Michael Schuster                  +49 89 46008-2974 / x62974
> Recursion, n.: see ''Recursion''
---
Chad Leigh -- Shire.Net LLC
Your Web App and Email hosting provider
chad at shire.net



-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2411 bytes
Desc: not available
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20061006/91927f78/attachment.bin>

przemolicc at poczta.fm

2006-Oct-06 07:40 UTC

head link

[zfs-discuss] A versioning FS

On Fri, Oct 06, 2006 at 01:14:23AM -0600, Chad Leigh -- Shire.Net LLC
wrote:> 
> But I would dearly like to have a versioning capability.
Me too.
Example (real life scenario): there is a samba server for about 200
concurrent connected users. They keep mainly doc/xls files on the
server.  From time to time they (somehow) currupt their files (they
share the files so it is possible) so they are recovered from backup.
Having versioning they could be said that if their main file is
corrupted they can open previous version and keep working.
ZFS snapshots is not solution in this case because we would have to
create snapshots for 400 filesystems (yes, each user has its filesystem
and I said that there are 200 concurrent connections but there much more
accounts on the server) each hour or so.


przemol

Jeremy Teo

2006-Oct-06 14:59 UTC

head link

[zfs-discuss] A versioning FS

Hello,

On 10/6/06, przemolicc at poczta.fm <przemolicc at poczta.fm>
wrote:> On Fri, Oct 06, 2006 at 01:14:23AM -0600, Chad Leigh -- Shire.Net LLC
wrote:
> >
> > But I would dearly like to have a versioning capability.
>
> Me too.
> Example (real life scenario): there is a samba server for about 200
> concurrent connected users. They keep mainly doc/xls files on the
> server.  From time to time they (somehow) currupt their files (they
> share the files so it is possible) so they are recovered from backup.
> Having versioning they could be said that if their main file is
> corrupted they can open previous version and keep working.
> ZFS snapshots is not solution in this case because we would have to
> create snapshots for 400 filesystems (yes, each user has its filesystem
> and I said that there are 200 concurrent connections but there much more
> accounts on the server) each hour or so.
So, if I build it, people will want it? ;)
-- 
Regards,
Jeremy

Nicolas Williams

2006-Oct-06 15:04 UTC

head link

[zfs-discuss] A versioning FS

On Thu, Oct 05, 2006 at 02:19:46PM -0700, Erik Trimble
wrote:> Doing versioning at the file-system layer allows block-level changes to 
> be stored, so it doesn''t consume enormous amounts of extra space.
In
> fact, it''s more efficient than any versioning software (CVS, SVN, 
> teamware, etc) for storing versions.
Depends on the kinds of changes...  Insert a small amount of code at the
head of a large source file and most of the blocks might change...
> However, there are three BIG drawbacks for using versioning in your FS 
> (that assumes that it is a tunable parameter and can be turned off for a 
> FS when not desired):
> 
> [...]
> 
> Maybe when we change filesystems to a DB, we can look at automatic 
> versioning again, as a DB can mitigate #1 and #3 issues above, and can 
> actually implement #2 completely.   OracleFS, here I come. (<groan>)
The way a DB would do this would be by effectively having version-aware
interfaces.  I think we could extend ZFS to provide versioning, but
we''d
have to make relevant applications version-aware, and versioning would
have to be invisible to applications that aren''t version-aware (which
also must not create versions automatically).

Nico
--

Jeremy Teo

2006-Oct-06 15:25 UTC

head link

[zfs-discuss] A versioning FS

A couple of use cases I was considering off hand:

1. Oops i truncated my file
2. Oops i saved over my file
3. Oops an app corrupted my file.
4. Oops i rm -rf the wrong directory.
All of which can be solved by periodic snapshots, but versioning gives
us immediacy.

So is immediacy worth it to you folks? I rather not embark on writing
and finishing code on something no one wants besides me.
-- 
Regards,
Jeremy

Nicolas Williams

2006-Oct-06 15:45 UTC

head link

[zfs-discuss] A versioning FS

On Fri, Oct 06, 2006 at 11:25:29PM +0800, Jeremy Teo
wrote:> A couple of use cases I was considering off hand:
> 
> 1. Oops i truncated my file
> 2. Oops i saved over my file
> 3. Oops an app corrupted my file.
> 4. Oops i rm -rf the wrong directory.
> All of which can be solved by periodic snapshots, but versioning gives
> us immediacy.
There''s been talk of making every transaction a snapshot.

Of course, there''d be no information as to whether a transaction
includes a file close, or truncation, or whatever.

IMO a file versioning API would be good, but file versioning should
normally be invisible, particularly to applications that are not aware
of it (which would be every application to date).

So think about the interfaces first.

I think ls(1) would have to be made version-aware.  And
cp(1)/mv(1)/ln(1).  That would be enough for a start.

Then add find/sfind and tar/star support.

And GNOME support.

Nico
--

Anton B. Rang

2006-Oct-06 16:18 UTC

head link

[zfs-discuss] Re: A versioning FS

ClearCase is a version control system, though ? not the same as file versioning.
 
 
This message posted from opensolaris.org

Nicolas Williams

2006-Oct-06 16:21 UTC

head link

[zfs-discuss] Re: A versioning FS

On Fri, Oct 06, 2006 at 09:18:16AM -0700, Anton B. Rang
wrote:> ClearCase is a version control system, though ? not the same as file
versioning.
But they have a filesystem interface.  Crucially, this involves
additional interfaces.  VC cannot be automatic.

David Dyer-Bennet

2006-Oct-06 17:18 UTC

head link

[zfs-discuss] A versioning FS

On 10/5/06, Wee Yeh Tan <weeyeh at gmail.com>
wrote:> On 10/6/06, David Dyer-Bennet <dd-b at dd-b.net> wrote:
> > One of the big problems with CVS and SVN and Microsoft SourceSafe is
> > that you don''t have the benefits of version control most of
the time,
> > because all commits are *public*.
>
> David,
>
> That is exactly what "branch" is for in CVS and SVN.  Dunno much
about
> M$ SourceSafe.
I''ve never encountered branch being used that way, anywhere. 
It''s
used for things like developing release 2.0 while still supporting 1.5
and 1.6.

However, especially with merge in svn it might be feasible to use a
branch that way.  What''s the operation to update the branch from the
trunk in that scenario?
-- 
David Dyer-Bennet, <mailto:dd-b at dd-b.net>,
<http://www.dd-b.net/dd-b/>
RKBA: <http://www.dd-b.net/carry/>
Pics: <http://www.dd-b.net/dd-b/SnapshotAlbum/>
Dragaera/Steven Brust: <http://dragaera.info/>

Ed Plese

2006-Oct-06 17:55 UTC

head link

[zfs-discuss] A versioning FS

On Fri, Oct 06, 2006 at 09:40:22AM +0200, przemolicc at poczta.fm
wrote:> Example (real life scenario): there is a samba server for about 200
> concurrent connected users. They keep mainly doc/xls files on the
> server.  From time to time they (somehow) currupt their files (they
> share the files so it is possible) so they are recovered from backup.
> Having versioning they could be said that if their main file is
> corrupted they can open previous version and keep working.
> ZFS snapshots is not solution in this case because we would have to
> create snapshots for 400 filesystems (yes, each user has its filesystem
> and I said that there are 200 concurrent connections but there much more
> accounts on the server) each hour or so.
Why is creating that many snapshots a problem?  The somewhat recent addition
of recursive snapshots (zfs snapshot -r) reduces this to a single command.
Taking individual snapshots of each filesystem can take a decent amount
of time, but I was under the impression that recursive snapshots would
be much faster due to the snapshots being committed in a single transaction.
Is this not correct?


Ed Plese

Matthew Ahrens

2006-Oct-06 18:57 UTC

head link

[zfs-discuss] A versioning FS

przemolicc at poczta.fm wrote:> On Fri, Oct 06, 2006 at 01:14:23AM -0600, Chad Leigh -- Shire.Net LLC
wrote:
>> But I would dearly like to have a versioning capability.
> 
> Me too.
> Example (real life scenario): there is a samba server for about 200
> concurrent connected users. They keep mainly doc/xls files on the
> server.  From time to time they (somehow) currupt their files (they
> share the files so it is possible) so they are recovered from backup.
> Having versioning they could be said that if their main file is
> corrupted they can open previous version and keep working.
> ZFS snapshots is not solution in this case because we would have to
> create snapshots for 400 filesystems (yes, each user has its filesystem
> and I said that there are 200 concurrent connections but there much more
> accounts on the server) each hour or so.
I completely disagree.  In this scenario (and almost all others), use of 
regular snapshots will solve the problem.  ''zfs snapshot -r''
is
extremely fast, and I''m working on some new features that will make 
using snapshots for this even easier and better-performing.

If you disagree, please tell us *why* you think snapshots don''t solve 
the problem.

--matt

Matthew Ahrens

2006-Oct-06 19:02 UTC

head link

[zfs-discuss] A versioning FS

Jeremy Teo wrote:> A couple of use cases I was considering off hand:
> 
> 1. Oops i truncated my file
> 2. Oops i saved over my file
> 3. Oops an app corrupted my file.
> 4. Oops i rm -rf the wrong directory.
> All of which can be solved by periodic snapshots, but versioning gives
> us immediacy.
> 
> So is immediacy worth it to you folks? I rather not embark on writing
> and finishing code on something no one wants besides me.
In my opinion, the marginal benefit of per-write(2) versions over 
snapshots (which can be per-transaction, ie. every ~5 seconds) does not 
outweigh the complexity of implementation and use/administration.

--matt

Chad Leigh -- Shire.Net LLC

2006-Oct-06 19:05 UTC

head link

[zfs-discuss] A versioning FS

On Oct 6, 2006, at 1:02 PM, Matthew Ahrens wrote:
> Jeremy Teo wrote:
>> A couple of use cases I was considering off hand:
>> 1. Oops i truncated my file
>> 2. Oops i saved over my file
>> 3. Oops an app corrupted my file.
>> 4. Oops i rm -rf the wrong directory.
>> All of which can be solved by periodic snapshots, but versioning  
>> gives
>> us immediacy.
>> So is immediacy worth it to you folks? I rather not embark on writing
>> and finishing code on something no one wants besides me.
>
> In my opinion, the marginal benefit of per-write(2) versions over  
> snapshots (which can be per-transaction, ie. every ~5 seconds) does  
> not outweigh the complexity of implementation and use/administration.
disclaimer:  I have not used zfs snapshots a lot as I am still  
experimenting with zfs, but they appear to be similar to freebsd  
snapshots, with which I am familiar.

The user experience with snapshots, in terms of file versioning (#1,  
#2, maybe #3) is much worse than a true file versioning user  
experience.  People are oriented to their files, not to snapshots.   
And I may not want versioning with all my files (object files etc)  
which you would get with the snapshots.

Chad

---
Chad Leigh -- Shire.Net LLC
Your Web App and Email hosting provider
chad at shire.net



-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2411 bytes
Desc: not available
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20061006/65384b7d/attachment.bin>

Joseph Mocker

2006-Oct-06 19:19 UTC

head link

[zfs-discuss] A versioning FS

Matthew Ahrens wrote:
>
> If you disagree, please tell us *why* you think snapshots don''t
solve
> the problem.
Technically there''s a race condition here. If you''re taking
regular
snapshots, you might see

10:25 - snapshot 1 - myfile.xls version 21
10:26 -            - myfile.xls version 22
10:27 -            - myfile.xls version 23 - corrupted
10:30 - snapshot 2 - myfile.xls version 23 - corrupted

So if you need to roll back to a previous version, the most recent 
non-corrupt version (22)  is lost.

Snapshots are a decent alternative but not as comprehensive and perhaps 
automatic as people would like.

  --joe

David Dyer-Bennet

2006-Oct-06 19:22 UTC

head link

[zfs-discuss] A versioning FS

On 10/6/06, Matthew Ahrens <Matthew.Ahrens at sun.com>
wrote:> Jeremy Teo wrote:
> > A couple of use cases I was considering off hand:
> >
> > 1. Oops i truncated my file
> > 2. Oops i saved over my file
> > 3. Oops an app corrupted my file.
> > 4. Oops i rm -rf the wrong directory.
> > All of which can be solved by periodic snapshots, but versioning gives
> > us immediacy.
> >
> > So is immediacy worth it to you folks? I rather not embark on writing
> > and finishing code on something no one wants besides me.
>
> In my opinion, the marginal benefit of per-write(2) versions over
> snapshots (which can be per-transaction, ie. every ~5 seconds) does not
> outweigh the complexity of implementation and use/administration.
It may quite possibly not be worth adding the second, fairly similar,
facility.  In addition to the points you cite, trying to explain to
average users what the two are and when to use each one would be
fairly challenging.

All the arguments about piles of version seem to apply in spades to
taking snapshots every 5 seconds.  And given the snapshot hierarchy,
it''s much harder to find your file in the snapshot you want
(let''s say
your file is 5 or 10 directories down, quite common in source trees in
my experience; you have to go back to the top and navigate to
~/.zfs/<weirdsnapshotdirectoryname>/foo/bar/mumble/bag/baz/etc/the-file-I-want.cpp
in each snapshot that might have the version you''re looking for.

I''d say the snapshot system is not as good as file versioning for the
tasks I think file versioning is best for.  However, snapshotting at
very freaquent intervals would definitely capture close enough to the
version I need to retrieve to make it a tolerable alternative.  The
user interface to it for retrieving a file is rather harder to use, it
seems to me, and that might possibly discourage use when it would have
been helpful.
-- 
David Dyer-Bennet, <mailto:dd-b at dd-b.net>,
<http://www.dd-b.net/dd-b/>
RKBA: <http://www.dd-b.net/carry/>
Pics: <http://www.dd-b.net/dd-b/SnapshotAlbum/>
Dragaera/Steven Brust: <http://dragaera.info/>

Nicolas Williams

2006-Oct-06 19:25 UTC

head link

[zfs-discuss] A versioning FS

On Fri, Oct 06, 2006 at 12:02:16PM -0700, Matthew Ahrens
wrote:> In my opinion, the marginal benefit of per-write(2) versions over 
> snapshots (which can be per-transaction, ie. every ~5 seconds) does not 
> outweigh the complexity of implementation and use/administration.
Per-write(2) versions would be worse than useless in many, if not most
cases.  Even per-close(2) versions wouldn''t always be useful.

Versions need to be captured / snapshots need to be taken when it makes
sense given what the application/user is doing.  Versioning cannot be
automated; taking periodic snapshots != capturing application state.

FS-wide snapshots make a lot of sense in general, and can serve as a
basic versioning tool for filesystems that have very specific uses
(e.g., for a database).

Per-file snapshots (versions, whatever) make sense more generally, but
need to be used by applications/users that are aware of them, else
the feature would go unused or, worse, it''d be worse than useless.

IMO, there''s no urgent need for any new features around this, but if
there is a need, then it''s for snapshots that aren''t
filesystem-wide,
with APIs so that applications can be aware of it.

Nico
--

Joseph Mocker

2006-Oct-06 19:26 UTC

head link

[zfs-discuss] A versioning FS

Chad Leigh -- Shire.Net LLC wrote:
>
> disclaimer:  I have not used zfs snapshots a lot as I am still  
> experimenting with zfs, but they appear to be similar to freebsd  
> snapshots, with which I am familiar.
>
> The user experience with snapshots, in terms of file versioning (#1,  
> #2, maybe #3) is much worse than a true file versioning user  
> experience.  People are oriented to their files, not to snapshots.   
> And I may not want versioning with all my files (object files etc)  
> which you would get with the snapshots.
disclaimer: ditto

I tend to agree with Chad though. If you are taking snapshots every 5 
seconds like Matthew suggests in a earlier reply, how does a user easily 
go back to previous versions without encountering a bunch of duplicated 
"versions" in the myriad of snapshots that are being taken. If the 
latest snapshot is number 2000, for example, and my file was last 
changed in snapshot 450. How do I easily figure that out without walking 
through snapshots 1999 - 451 before finding it?

  --joe

Erik Trimble

2006-Oct-06 21:08 UTC

head link

[zfs-discuss] A versioning FS

First of all, let''s agree that this discussion of File Versioning makes
no more reference to its usage as Version Control.  That is, we aren''t 
going to talk about it being useful for source code, other than in the 
context where a source code file is a document, like any other text 
document.  File Versioning and Version Control are separate things, with 
different purposes and feature sets.


OK. So, now we''re on to FV.  As Nico pointed out, FV is going to need a
new API.  Using the VMS convention of simply creating file names with a 
version string afterwards is unacceptible, as it creates enormous 
directory pollution, not to mention user confusion.  So, FV has to be 
invisible to non-aware programs.

Now we have a problem:  how do we access FV for non-local (e.g. 
SAMBA/NFS) clients?  Since the VAST majority of usefulness of FV is in 
the network file server arena, unless we can use FV over the network, it 
is useless.  You can''t modify the SMB or NFS protocol (easily or 
quickly) to add FV functionality (look how hard it was to add ACLs to 
these protocols).

About the only way I can think around this problem is to store versions 
in a special subdir of each directory (e.g. .zfs_version), which would 
then be browsable over the network, using tools not normally FV-aware.  
But this puts us back into the problem of a directory which potentially 
has hundreds or thousands of files.

Also, "save-early-save-often"  results in a version explosion, as does
auto-save in the app.  While this may indeed mean that you have all of 
your changes around, figuring out which version has them can be 
massively time-consuming.  Let''s say you have auto-save set for 5 
minutes (very common in MS Word). That gives you 12 versions per hour.  
If you suddenly decide you want to back up a couple of hours, that 
leaves you with looking at a whole bunch of files, trying to figure out 
which one you want.  E.g. I want a file from about 3 hours ago. Do I 
want the one from 2:45, 2:50, 2:55, 3:00, 3:05, 3:10, or 3:15 hours 
ago?  And, what if I''ve mis-remembered, and it really was closer to 4 
hours ago?  Yes, the data is eventually there. However, wouldn''t a 
1-hour snapshot capability have saved you an enormous amount of time, by 
being able to simplify your search (and, yes, you won''t have _exactly_ 
the version you want, but odds are you will have something close, and 
you can put all the time you would have spent searching the FV tree into 
restarting work from the snapshot-ed version).

Remember, FV''s main audience is going to be "naive" users,
not us
technical users, who generally have the problem that FV solves under 
control (yes, FV would make it easier for us, but we''re not the primary
target).  Version explosion (and the consequential problem of picking 
the right version to edit) is a huge problem for the naive audience.

Also, a big difference between Snapshots and FV tends to be who controls 
EOL-ing a version/Snapshot.  Snapshots tend to be done by the Admin, and 
their aging strictly controlled and defines (e.g. "we keep hourly 
snapshots for 1 week"). File versioning is typically under the control 
of the End-User, as their utility is much more nebulously defined.   
Certainly, there is no ability to truncate based on number of versions 
(e.g. "we only allow 100 versions to be kept"), since the frequency of
versioning a file varies widely.  Aging on a version is possibly a 
better answer, but this runs into a problem of user education, where we 
have to retrain our users to stop making frequent copies of important 
documents (like they do now, in absence of FV), but _do_ remember to dig 
through the FV archive periodically to save a desirable old copy.   
Also, if  managing FV is to be a User task, how are they to do it over 
NFS/SAMBA?  And, "log into the NFS server to do a cleanup"
isn''t an
acceptable answer.

Also, FV is only useful for apps which do a "close()" on a file (or at
least, I''m assuming we wait for a file to signal that it is closed 
before taking a version - otherwise, we do what? take a version every X 
minutes while the file still open? I shudder to think about the 
implementation of this, and its implications...).  How many apps keep a 
file open for a long period of time?  FV isn''t useful to them, only an 
"unlimited undo" functionality INSIDE the app.

Lastly, consider the additional storage requirement of FV, and exactly 
how much utility you gain for sacrificing disk space.
Look at this scenario:  I''m editing a file, making 1MB of change per 5 
minutes (a likely scenario when actively editing any Office-style 
document), of which only 50% to I actually make permanent (the rest 
being temp edits for ideas I decide to change or throw out).  If I''m 
auto-saving every 5 minutes, that means I use 12MB of version space per 
hour. If I took a hourly snapshot, then I need only 6MB of storage.  The 
situation gets worse, for the primary usefulness of FV is for files 
which are frequently edited - mean that they have rapid content change, 
and not in append-mode. Such a usage pattern means that FV will take up 
a much greater amount of space than periodic snapshots, as the longer 
interval in snapshots will allow the changes to "settle".


To me, FV is/was very useful in TOPS-20 and VMS, where you were looking 
at a system DESIGNED with the idea in mind, already have a user base 
trained to use and expect it, and virtually all usage was local (i.e. no 
network filesharing). None of this is true in the UNIX/POSIX world.


-Erik

Erik Trimble

2006-Oct-06 21:14 UTC

head link

[zfs-discuss] A versioning FS

Chad Leigh -- Shire.Net LLC wrote:> disclaimer:  I have not used zfs snapshots a lot as I am still 
> experimenting with zfs, but they appear to be similar to freebsd 
> snapshots, with which I am familiar.
>
> The user experience with snapshots, in terms of file versioning (#1, 
> #2, maybe #3) is much worse than a true file versioning user 
> experience.  People are oriented to their files, not to snapshots.  
> And I may not want versioning with all my files (object files etc) 
> which you would get with the snapshots.
>
> Chad
>You can''t turn off and on File Versioning at the file level. At least,
I
can''t imaging trying to support (i.e. write) this kind of functionality
into ZFS.  File Versioning would be a tunable parameter for each 
filesystem.   So, you''d have to store your object files on a different 
filesystem than your code. Which would make snapshots no different than 
FV, w/r/t keeping versions of the code, and not the object files.

-Erik

Chad Leigh -- Shire.Net LLC

2006-Oct-06 21:26 UTC

head link

[zfs-discuss] A versioning FS

On Oct 6, 2006, at 3:14 PM, Erik Trimble wrote:
> Chad Leigh -- Shire.Net LLC wrote:
>> disclaimer:  I have not used zfs snapshots a lot as I am still  
>> experimenting with zfs, but they appear to be similar to freebsd  
>> snapshots, with which I am familiar.
>>
>> The user experience with snapshots, in terms of file versioning  
>> (#1, #2, maybe #3) is much worse than a true file versioning user  
>> experience.  People are oriented to their files, not to  
>> snapshots.  And I may not want versioning with all my files  
>> (object files etc) which you would get with the snapshots.
>>
>> Chad
>>
> You can''t turn off and on File Versioning at the file level. At  
> least, I can''t imaging trying to support (i.e. write) this kind of
> functionality into ZFS.
???  I will admit that I am not involved with the ZFS code, but it  
would seem that extensible meta data should make this easy.  From  
reading some threads in a forum about Apple''s possible use of ZFS  
(conjecture in some forums) a Sun engineer mentioned that ZFS was  
easily extensible in the meta data arena so that Apple should have no  
problems meeting their requirements.  Was this incorrect?
> File Versioning would be a tunable parameter for each filesystem.    
> So, you''d have to store your object files on a different
filesystem
> than your code. Which would make snapshots no different than FV, w/ 
> r/t keeping versions of the code, and not the object files.
The problem is that you are stuck on snapshots and cannot think  
"outside of the box".  All your implementations you are thinking of  
are constrained by your tunnel vision.

This is not meant as a personal attack.  It is just that the  
arguments you put forth (in your long post which I am in the middle  
of replying to) show that this tunnel vision is readily apparent.

Chad
>
> -Erik
---
Chad Leigh -- Shire.Net LLC
Your Web App and Email hosting provider
chad at shire.net



-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2411 bytes
Desc: not available
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20061006/e2398d96/attachment.bin>

Chad Leigh -- Shire.Net LLC

2006-Oct-06 21:30 UTC

head link

[zfs-discuss] A versioning FS

On Oct 6, 2006, at 3:08 PM, Erik Trimble wrote:
> First of all, let''s agree that this discussion of File Versioning
> makes no more reference to its usage as Version Control.  That is,  
> we aren''t going to talk about it being useful for source code,  
> other than in the context where a source code file is a document,  
> like any other text document.  File Versioning and Version Control  
> are separate things, with different purposes and feature sets.
>
>
> OK. So, now we''re on to FV.  As Nico pointed out, FV is going to  
> need a new API.  Using the VMS convention of simply creating file  
> names with a version string afterwards is unacceptible, as it  
> creates enormous directory pollution,
Assumption, not supported.  "Eye of the  beholder."
> not to mention user confusion.
Assumption, not supported.
> So, FV has to be invisible to non-aware programs.
yes
>
> Now we have a problem:  how do we access FV for non-local (e.g.  
> SAMBA/NFS) clients?  Since the VAST majority of usefulness of FV is  
> in the network file server arena,
Assumption, and definitely not supported.   It is very useful outside  
of the file sharing arena.
> unless we can use FV over the network, it is useless.
Wrong
> You can''t modify the SMB or NFS protocol (easily or quickly) to
add
> FV functionality (look how hard it was to add ACLs to these  
> protocols).
>
> About the only way I can think around this problem is to store  
> versions in a special subdir of each directory (e.g. .zfs_version),  
> which would then be browsable over the network, using tools not  
> normally FV-aware.  But this puts us back into the problem of a  
> directory which potentially has hundreds or thousands of files.
This directory way of doing it is not a good way.  It fails the ease  
of use to the end user test.

The VMS way is far superior.  The problem is that you have to make  
sure that apps that are not FV aware have no problems, which means  
you cannot just append something to the actual file name. It has to  
be some sort of meta data.
>
> Also, "save-early-save-often"  results in a version explosion, as
> does auto-save in the app.
Does not have to.  In VMS it is configurable on how many versions you  
want to save before it does an auto purge. A simple purge command  
then cleans things up for you.  Very minimal requirements for  
"retraining" the user.  Set the default configuration to be a max of  
1 version and you have no problems unless you turn it on.
> While this may indeed mean that you have all of your changes  
> around, figuring out which version has them can be massively time- 
> consuming.
Your assumption.  (And much less hard than using snapshots).
> Let''s say you have auto-save set for 5 minutes (very common in MS
> Word). That gives you 12 versions per hour.
So?
> If you suddenly decide you want to back up a couple of hours, that  
> leaves you with looking at a whole bunch of files, trying to figure  
> out which one you want.  E.g. I want a file from about 3 hours ago.  
> Do I want the one from 2:45, 2:50, 2:55, 3:00, 3:05, 3:10, or 3:15  
> hours ago?
Look at the file create time.  Take a quick look at the contents if  
you are confused.  At least you HAVE the capability to go back.
>   And, what if I''ve mis-remembered, and it really was closer to 4
> hours ago?
Simple file system tools help me find it.
> Yes, the data is eventually there. However, wouldn''t a 1-hour  
> snapshot capability have saved you an enormous amount of time,
No.  Managing the versions is not hard like you say.  I lived on VMS  
for years and it was never a problem.  It is your mindset and your  
preconceived notions that is the problem
> by being able to simplify your search (and, yes, you won''t have  
> _exactly_ the version you want, but odds are you will have  
> something close, and you can put all the time you would have spent  
> searching the FV tree into restarting work from the snapshot-ed  
> version).
I would much rather take an extra 2 minutes futzing around with the  
FV saved versions than trying to recreate what I had done.  And  
snapshots are not user friendly from a UI perspective -- funny  
strange directories and having to dig around in them.
>
> Remember, FV''s main audience is going to be "naive"
users, not us
> technical users,
No, it is US technical users as much as the naive user.
> who generally have the problem that FV solves under control (yes,  
> FV would make it easier for us, but we''re not the primary target).
We do?  I have often edited system files and then wanted to go back  
to something I deleted earlier as I realized it was the wrong one.
> Version explosion (and the consequential problem of picking the  
> right version to edit) is a huge problem for the naive audience.
>
This statement is naive itself and is unsupportable.  Where are the  
usability tests that support this?  VMS has a LONG HISTORY and is/was  
used by a lot of what you call "naive" users.  FV never caused any  
problems that I encountered or indeed that DEC encountered as it  
never once came up as a an issue with VMS usability.
> Also, a big difference between Snapshots and FV tends to be who  
> controls EOL-ing a version/Snapshot.  Snapshots tend to be done by  
> the Admin, and their aging strictly controlled and defines (e.g.  
> "we keep hourly snapshots for 1 week"). File versioning is  
> typically under the control of the End-User, as their utility is  
> much more nebulously defined.   Certainly, there is no ability to  
> truncate based on number of versions (e.g. "we only allow 100  
> versions to be kept"), since the frequency of versioning a file  
> varies widely.  Aging on a version is possibly a better answer, but  
> this runs into a problem of user education, where we have to  
> retrain our users to stop making frequent copies of important  
> documents (like they do now, in absence of FV), but _do_ remember  
> to dig through the FV archive periodically to save a desirable old  
> copy.   Also, if  managing FV is to be a User task, how are they to  
> do it over NFS/SAMBA?  And, "log into the NFS server to do a  
> cleanup" isn''t an acceptable answer.
>
> Also, FV is only useful for apps which do a "close()" on a file
(or
> at least, I''m assuming we wait for a file to signal that it is  
> closed before taking a version - otherwise, we do what? take a  
> version every X minutes while the file still open? I shudder to  
> think about the implementation of this, and its implications...).   
> How many apps keep a file open for a long period of time?  FV
isn''t
> useful to them, only an "unlimited undo" functionality INSIDE the
app.
Yes, any time you do a close() or equivalent. The idea is not to  
implement a universal undo stack.

You can always find a scenario where FV doesn''t help.  So what.   
There are lots of scenarios where it does help.  More positive  
scenarios than you can dream up negatives for.
>
> Lastly, consider the additional storage requirement of FV, and  
> exactly how much utility you gain for sacrificing disk space.
We have GB and TB of cheap space.  A few extra versions lying around  
until people hit their quotas is the users'' issue, not the sysadmin.
> Look at this scenario:  I''m editing a file, making 1MB of change  
> per 5 minutes (a likely scenario when actively editing any Office- 
> style document), of which only 50% to I actually make permanent  
> (the rest being temp edits for ideas I decide to change or throw  
> out).  If I''m auto-saving every 5 minutes, that means I use 12MB
of
> version space per hour. If I took a hourly snapshot, then I need  
> only 6MB of storage.
So.  Your snapshot is much less useful and 12MB is nothing in todays  
GBs of cheap space.  Probably compressed too so even less usage than  
you envision.
> The situation gets worse, for the primary usefulness of FV is for  
> files which are frequently edited - mean that they have rapid  
> content change, and not in append-mode. Such a usage pattern means  
> that FV will take up a much greater amount of space than periodic  
> snapshots, as the longer interval in snapshots will allow the  
> changes to "settle".
Not an issue.  Cheap disk space.
>
>
> To me, FV is/was very useful in TOPS-20 and VMS, where you were  
> looking at a system DESIGNED with the idea in mind, already have a  
> user base trained to use and expect it, and virtually all usage was  
> local (i.e. no network filesharing). None of this is true in the  
> UNIX/POSIX world.
And does not affects its usefulness.

Chad
>
>
> -Erik
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
---
Chad Leigh -- Shire.Net LLC
Your Web App and Email hosting provider
chad at shire.net



-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2411 bytes
Desc: not available
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20061006/643a9a6e/attachment.bin>

David Dyer-Bennet

2006-Oct-06 21:47 UTC

head link

[zfs-discuss] A versioning FS

On 10/6/06, Erik Trimble <Erik.Trimble at sun.com>
wrote:> First of all, let''s agree that this discussion of File Versioning
makes
> no more reference to its usage as Version Control.  That is, we
aren''t
> going to talk about it being useful for source code, other than in the
> context where a source code file is a document, like any other text
> document.  File Versioning and Version Control are separate things, with
> different purposes and feature sets.
Hmm; the most important uses of file versioning come, in my opinion,
when working on source code.  But for handling very different
situations than source control does.
> OK. So, now we''re on to FV.  As Nico pointed out, FV is going to
need a
> new API.  Using the VMS convention of simply creating file names with a
> version string afterwards is unacceptible, as it creates enormous
> directory pollution, not to mention user confusion.  So, FV has to be
> invisible to non-aware programs.
Strongly disagree, twice.

Having FV invisible to programs not updated to specially support it is
IMHO unacceptable, and would render the feature useless.

I remember it being a bit inconvenient on VMS.  It wasn''t on TOPS-20.
I''ll have to look into what the TOPS-20 conventions were again (I used
TOPS-20 from 1977 to 1985, but hardly touched it since), but I found
them very friendly and easy to work with, not confusing, etc.  They
weren''t *that* different from the VMS approach, but this is probably
one of those situations where tiny tweaks to user interface make a
huge difference to user experience.
> Also, FV is only useful for apps which do a "close()" on a file
(or at
> least, I''m assuming we wait for a file to signal that it is closed
> before taking a version - otherwise, we do what? take a version every X
> minutes while the file still open? I shudder to think about the
> implementation of this, and its implications...).  How many apps keep a
> file open for a long period of time?  FV isn''t useful to them,
only an
> "unlimited undo" functionality INSIDE the app.
It''s the rewrite scenario; when we open or rename a file on top of an
existing file, the new file gets an incremented version number, and
the old file stays around.
> Lastly, consider the additional storage requirement of FV, and exactly
> how much utility you gain for sacrificing disk space.
It was something we could afford, and did afford, on TOPS-20 systems
where having three RP06 disk pack systems (at 200MB each) was
considered rather a lot of storage.  Today it''s a complete non-issue.
Disk space is free.
> To me, FV is/was very useful in TOPS-20 and VMS, where you were looking
> at a system DESIGNED with the idea in mind, already have a user base
> trained to use and expect it, and virtually all usage was local (i.e. no
> network filesharing). None of this is true in the UNIX/POSIX world.
When TOPS-20 was introduced, essentially nobody was used to file
versioning.  When VMS was introduced, very few people were used to
file versioning (and the TOPS-20 community mostly moved to Unix rather
than VMS).  TOPS-20 wasn''t the first system I used (it was the fifth,
Ithink), or even the first timesharing system (the third, I believe).
File versioning was one of those "instant love" features; it was
instantly obvious how it worked, how to use it, and how beneficial it
was.

I see network file access as a non-issue; the version gets treated as
part of the file name, just as it did on all the previous systems that
supported file versioning.

I''m *still* not really sure it''s actually worth the trouble of
adding,
if 5-second snapshots are really feasible.  They''re less convenient to
use by quite a bit, but the important use cases arise relatively
rarely, and the value is high when they arise, so that''s not *too* big
an issue.  And more code complexity and more user confusion (I don''t
think versioning is terribly comlex to understand, but certainly
snapshots plus versioning is more complex than snapshots alone).  But
if people are going to decide against file versioning, I''d prefer it
to be based on a more accurate understanding of how it plays to users
:-).
-- 
David Dyer-Bennet, <mailto:dd-b at dd-b.net>,
<http://www.dd-b.net/dd-b/>
RKBA: <http://www.dd-b.net/carry/>
Pics: <http://www.dd-b.net/dd-b/SnapshotAlbum/>
Dragaera/Steven Brust: <http://dragaera.info/>

Nicolas Williams

2006-Oct-06 21:53 UTC

head link

[zfs-discuss] A versioning FS

On Fri, Oct 06, 2006 at 03:30:20PM -0600, Chad Leigh -- Shire.Net LLC
wrote:> On Oct 6, 2006, at 3:08 PM, Erik Trimble wrote:
> >OK. So, now we''re on to FV.  As Nico pointed out, FV is going
to
> >need a new API.  Using the VMS convention of simply creating file  
> >names with a version string afterwards is unacceptible, as it  
> >creates enormous directory pollution,
> 
> Assumption, not supported.  "Eye of the  beholder."
No, you really need an API, otherwise you have to guess when to snapshot
versions of files.
> >not to mention user confusion.
> 
> Assumption, not supported.
Maybe Erik would find it confusing.  I know I would find it _annoying_.
> >So, FV has to be invisible to non-aware programs.
> 
> yes
Interesting that you agree with this when you disagree with Erik''s
other
points!  To me this statement implies FV APIs.
> >Now we have a problem:  how do we access FV for non-local (e.g.  
> >SAMBA/NFS) clients?  Since the VAST majority of usefulness of FV is  
> >in the network file server arena,
> 
> Assumption, and definitely not supported.   It is very useful outside  
> of the file sharing arena.
I agree with you, and I agree with Erik.  We, Sun engineers that is,
need to look at the big picture, and network access is part of the big
picture.
> >unless we can use FV over the network, it is useless.
> 
> Wrong
Yes, but we have to provide for it.
> >You can''t modify the SMB or NFS protocol (easily or quickly)
to add
> >FV functionality (look how hard it was to add ACLs to these  
> >protocols).
> >
> >About the only way I can think around this problem is to store  
> >versions in a special subdir of each directory (e.g. .zfs_version),  
> >which would then be browsable over the network, using tools not  
> >normally FV-aware.  But this puts us back into the problem of a  
> >directory which potentially has hundreds or thousands of files.
> 
> This directory way of doing it is not a good way.  It fails the ease  
> of use to the end user test.
No, it doesn''t: it doesn''t preclude having FV-aware UIs that
make it
easier to access versions.  All Erik''s .zfs_version proposal is about
is
remote access, not a user interface.
> The VMS way is far superior.  The problem is that you have to make  
> sure that apps that are not FV aware have no problems, which means  
> you cannot just append something to the actual file name. It has to  
> be some sort of meta data.
I.e., APIs.

The big question though is: how to snapshot file versions when they are
touched/created by applications that are not aware of FV?

Certainly not with every write(2).  At fsync(2), close(2), open(2) for
write/append?  What if an application deals in multiple files?  Etc...

Automatically capturing file versions isn''t possible in the general
case
with applications that aren''t aware of FV.
> >While this may indeed mean that you have all of your changes  
> >around, figuring out which version has them can be massively time- 
> >consuming.
> 
> Your assumption.  (And much less hard than using snapshots).
I agree that with ZFS snapshots it could be hard to find the file
versions you want.  I don''t agree that the same isn''t true
with FV
*except* where you have FV-aware applications.
> Yes, any time you do a close() or equivalent. The idea is not to  
> implement a universal undo stack.
Or open(2) for write, fsync(2)s, unlinks.  Maybe.  It could work for
some apps and not for others.

(I really wouldn''t want building code to result in lots of file
versions
of intermediate and end-result files!)

Nico
--

Chad Leigh -- Shire.Net LLC

2006-Oct-06 22:06 UTC

head link

[zfs-discuss] A versioning FS

On Oct 6, 2006, at 3:53 PM, Nicolas Williams wrote:
> On Fri, Oct 06, 2006 at 03:30:20PM -0600, Chad Leigh -- Shire.Net  
> LLC wrote:
>> On Oct 6, 2006, at 3:08 PM, Erik Trimble wrote:
>>> OK. So, now we''re on to FV.  As Nico pointed out, FV is
going to
>>> need a new API.  Using the VMS convention of simply creating file
>>> names with a version string afterwards is unacceptible, as it
>>> creates enormous directory pollution,
>>
>> Assumption, not supported.  "Eye of the  beholder."
>
> No, you really need an API, otherwise you have to guess when to  
> snapshot
> versions of files.
What does "snapshot versions of files" mean?

My line "Assumption, not supported.  "Eye of the beholder""
was in
reference to "enormous directory polution"
>
>>> not to mention user confusion.
>>
>> Assumption, not supported.
>
> Maybe Erik would find it confusing.  I know I would find it  
> _annoying_.
Then leave it set to 1 version
>
>>> So, FV has to be invisible to non-aware programs.
>>
>> yes
>
> Interesting that you agree with this when you disagree with Erik''s
> other
> points!  To me this statement implies FV APIs.
It has to do with the implementation details.  I don''t know what sort  
of APIs you are saying are  needed.  Maybe they are needed and maybe  
they would be handy. I am not disputing that.

The above should be simple to do however -- a program does an open of  
a file name "foo.bar".  ZFS / the file system routine would use the  
most recent version by default if no version info is given.
>
>>> Now we have a problem:  how do we access FV for non-local (e.g.
>>> SAMBA/NFS) clients?  Since the VAST majority of usefulness of FV is
>>> in the network file server arena,
>>
>> Assumption, and definitely not supported.   It is very useful outside
>> of the file sharing arena.
>
> I agree with you, and I agree with Erik.  We, Sun engineers that is,
> need to look at the big picture, and network access is part of the big
> picture.
Sure
>
>>> unless we can use FV over the network, it is useless.
>>
>> Wrong
>
> Yes, but we have to provide for it.
I never said that file sharing is not useful (in this or any  
context).  I just said that FV is not useless except in the "over the  
network" use.  And if it did not support filesharing scenarios, at  
least in the beginning, it still has great use.  The same way that  
apache does not support lockfiles on nfs file systems, does not  make  
apache or nfs "useless", FV that is not 100% in every nook and cranny
does not make it useless.

I would find it of tremendous use just in managing system and  
configuration files.
>
>>> You can''t modify the SMB or NFS protocol (easily or
quickly) to add
>>> FV functionality (look how hard it was to add ACLs to these
>>> protocols).
>>>
>>> About the only way I can think around this problem is to store
>>> versions in a special subdir of each directory (e.g. .zfs_version),
>>> which would then be browsable over the network, using tools not
>>> normally FV-aware.  But this puts us back into the problem of a
>>> directory which potentially has hundreds or thousands of files.
>>
>> This directory way of doing it is not a good way.  It fails the ease
>> of use to the end user test.
>
> No, it doesn''t: it doesn''t preclude having FV-aware UIs
that make it
> easier to access versions.  All Erik''s .zfs_version proposal is  
> about is
> remote access, not a user interface.
one UI is the command line shell
>
>> The VMS way is far superior.  The problem is that you have to make
>> sure that apps that are not FV aware have no problems, which means
>> you cannot just append something to the actual file name. It has to
>> be some sort of meta data.
>
> I.e., APIs.
Well, file system level meta data that the file system uses may or  
may not need APIs to expose it -- depends on how the final  
implementation works.  However, I never came out against APIs
>
> The big question though is: how to snapshot file versions when they  
> are
> touched/created by applications that are not aware of FV?
Don''t use the word snapshot as it may draw in unintended comparisons  
to snapshot features.
>
> Certainly not with every write(2).
no
> At fsync(2), close(2), open(2) for
> write/append?
probably
> What if an application deals in multiple files?
so?
> Etc...
>
> Automatically capturing file versions isn''t possible in the
general
> case
> with applications that aren''t aware of FV.
In most cases it is possible.  At worst you make a copy on open and  
work on the copy, making it the most recent version.
>
>>> While this may indeed mean that you have all of your changes
>>> around, figuring out which version has them can be massively time-
>>> consuming.
>>
>> Your assumption.  (And much less hard than using snapshots).
>
> I agree that with ZFS snapshots it could be hard to find the file
> versions you want.  I don''t agree that the same isn''t
true with FV
> *except* where you have FV-aware applications.
How so?  The shell / desktop is enough of a UI to deal with it.
>
>> Yes, any time you do a close() or equivalent. The idea is not to
>> implement a universal undo stack.
>
> Or open(2) for write, fsync(2)s, unlinks.  Maybe.  It could work for
> some apps and not for others.
See my comments above -- worst case is to copy the file on open and  
then do everything on the copy as normal.
>
> (I really wouldn''t want building code to result in lots of file  
> versions
> of intermediate and end-result files!)
No harm as they get deleted by the build process anyway.  And if you  
"enhance" the FV, you can set directories like scratch directories to
not allow more than 1 FV per file.

Chad
>
> Nico
> -- 
---
Chad Leigh -- Shire.Net LLC
Your Web App and Email hosting provider
chad at shire.net



-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2411 bytes
Desc: not available
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20061006/d72e7840/attachment.bin>

David Dyer-Bennet

2006-Oct-06 22:11 UTC

head link

[zfs-discuss] A versioning FS

On 10/6/06, Nicolas Williams <Nicolas.Williams at sun.com>
wrote:> On Fri, Oct 06, 2006 at 03:30:20PM -0600, Chad Leigh -- Shire.Net LLC
wrote:
> > On Oct 6, 2006, at 3:08 PM, Erik Trimble wrote:
> > >OK. So, now we''re on to FV.  As Nico pointed out, FV is
going to
> > >need a new API.  Using the VMS convention of simply creating file
> > >names with a version string afterwards is unacceptible, as it
> > >creates enormous directory pollution,
> >
> > Assumption, not supported.  "Eye of the  beholder."
>
> No, you really need an API, otherwise you have to guess when to snapshot
> versions of files.
First of all "snapshot versions of files" is a very confusing phrase
especially in this discussion.  But, if you mean what I think you
mean, then the existing file API gives you all the information you
need . Whenever you create a new file, you create a new version.  The
only thing that changes is, if an *old* version already exists, it
doesn''t get deleted the way it used to.
> > >Now we have a problem:  how do we access FV for non-local (e.g.
> > >SAMBA/NFS) clients?  Since the VAST majority of usefulness of FV
is
> > >in the network file server arena,
> >
> > Assumption, and definitely not supported.   It is very useful outside
> > of the file sharing arena.
>
> I agree with you, and I agree with Erik.  We, Sun engineers that is,
> need to look at the big picture, and network access is part of the big
> picture.
Yes, I have to agree here also.  So much of people''s file access is
over a network these days that a local-only facility isn''t very
interesting / useful.
> > >You can''t modify the SMB or NFS protocol (easily or
quickly) to add
> > >FV functionality (look how hard it was to add ACLs to these
> > >protocols).
> > >
> > >About the only way I can think around this problem is to store
> > >versions in a special subdir of each directory (e.g.
.zfs_version),
> > >which would then be browsable over the network, using tools not
> > >normally FV-aware.  But this puts us back into the problem of a
> > >directory which potentially has hundreds or thousands of files.
> >
> > This directory way of doing it is not a good way.  It fails the ease
> > of use to the end user test.
>
> No, it doesn''t: it doesn''t preclude having FV-aware UIs
that make it
> easier to access versions.  All Erik''s .zfs_version proposal is
about is
> remote access, not a user interface.
Requiring special software to access this kind of feature is death.
People don''t want to learn new tools; they want to learn existing
tools.  Depending on the user, that''s ls, or awk, or grep, or find, or
Emacs dired, or this or that or the other thing.

One of the reasons ZFS snapshots (and other snapshots, in my limited
experience) work easily is that they appear as ordinary files within
the directory structure, and do *not* require special tools to access.
> > The VMS way is far superior.  The problem is that you have to make
> > sure that apps that are not FV aware have no problems, which means
> > you cannot just append something to the actual file name. It has to
> > be some sort of meta data.
>
> I.e., APIs.
I don''t think I understand the issues being raised here.  My
off-the-cuff impression is that they don''t exist at all, or are at
least moderate molehills not mountains.

When writing an application for TOPS-20 or VMS, you didn''t have to do
anything to specifically deal with file versioning.  It just worked.
If the user wanted the most recent version of the file, they typed the
name without the version, or else with the most current version.  If
they *did* want an older version, they had to type very slightly more,
by appending the version number.  And (on TOPS-20) of course we had
filename completion and inline help to make it easy to refresh your
memory on what versions existed in the middle of doing this.

So, one small feature built into the filesystem OPEN code: if a
version is not specificied for a file, use the most recent version.
NO special code in any application is needed.

There are public-access TOPS-20 systems on the net today (I''ve got an
account on one, though that data is at home and I''m in Palo Alto this
week).  And I''ve still got the small TOPS-20 system manual (I
didn''t
keep the big twenty-something volume set though) where I can look up
the details when I''m home.  This technology isn''t completely
lost yet
:-).
-- 
David Dyer-Bennet, <mailto:dd-b at dd-b.net>,
<http://www.dd-b.net/dd-b/>
RKBA: <http://www.dd-b.net/carry/>
Pics: <http://www.dd-b.net/dd-b/SnapshotAlbum/>
Dragaera/Steven Brust: <http://dragaera.info/>

Erik Trimble

2006-Oct-06 22:43 UTC

head link

[zfs-discuss] A versioning FS

Chad,

I think our problem is that we look at FV from different angles. I look 
at it from the point of view of people who have NEVER used FV, and you 
look at it from the view of people who have ALWAYS used FV.

For those of us who have never had FV available, technical users have 
used VC tools for important files forever (scripts, config files, etc), 
and will continue to use VC for those purposes, even if FV is 
implemented, as VC has decided advantages for these uses (history, 
management, etc.).   For the technical user, FV is primarily useful for 
when editing documents where were never put under VC in the pre-FV 
era.   This is virtually identical to the usage for "naive" users.
That
is, FV is highly useful for keeping multiple copies of documents under 
active editing.


In order for an FV implementation to be useful for this stated purpose, 
it must fulfill the following requirements:

(1)  Clean interface for users.  That is, one must NOT be presented with 
a complete list of all versions unless explicitly asked for it, and it 
should be simple to select a version based on some reasonable criteria 
(date of creation/modification, version number, etc.)

(2)  Simple way to decide if a file should be versioned or not. Either 
automatically version all files (or none at all), or provide a mechanism 
to turn FV on/off on a per-file or per-directory basis.

(3)  Network-FS awareness.  Without this, FV is severely limited. Given 
my preconditions above (that is, the current usage pattern of us in the 
non-FS world), limiting FV to those on the local system restricts its 
usefulness to the point where it isn''t worth the effort.


So, we have two scenarios for the implementation here:

(a)  FV requires no special API, and all programs using the Filesystem 
automatically have access to versions

(b)  FV uses a new API, so versions are only available to applications 
using the new API


For case (a), you are going to have to store the versions as files 
_somewhere_, in which case you run into the "directory pollution" 
problem I quote (if you store the versions next to the "current" 
version), or the "where is my version" problem that you quote w/r/t 
snapshots (if you store them elsewhere).

In case (b), you will have to re-write _all_ FS-access apps to make them 
FV-aware, in the same manner work had to be done to make apps 
ACL-aware.  And, to get requirement (3) above, you have to modify the 
network FS protocols to support the API calls.


Also, regardless of which implementation mechanism you use (a) or (b), 
you will need some sort of tool to indicate which files are to be 
versioned (to satisfy requirement (2) above), how many versions are to 
be kept, and other FV administration utilities.  These tools will all 
need to be netFS-aware/usable.


Disk space consumption is NOT irrelevant. Else, why is there so much 
concern around the ZFS compression project?  Disk is NOT cheap - on the 
desktop, yes, but I''m sorry, networked disk systems are not really 
cheap, and tape archivers less so.  Allocating several GB of disk space 
per end-user is not uncommon, so 1000 users requires multi-terabyte 
systems just for "normal" storage (i.e. no 
backups/versions/snapshots/archives).  Take a look at what a typical 
system costs:  $10+/GB for workgroup-level storage (Sun 3510FC class, 
1-20TB), $30+/GB for nice mid-level SAN storage arrays (Sun 6920-class. 
 >10TB).  If I have to increase my storage requirements 25-50% for FV, 
most of which is unused versions, this is decidedly non-trivial 
amounts.  This applies as well to the 5-second snapshot proposal.


For source code, FV isn''t really needed - the problem has already been 
solved.  If your particular VC/editor/IDE doesn''t handle the problem 
correctly, then switch.  There are many VC and IDE combinations on all 
platforms which provide a solution to the same problem FV solves.  
Mercurial, RationalRose, BitKeeper, Git, and others on the VC side; 
NetBeans, CodeWarrior, Visual Studio, and even Emacs can be configured 
to handle the problem on the IDE side.


-Erik

Nicolas Williams

2006-Oct-06 23:15 UTC

head link

[zfs-discuss] A versioning FS

On Fri, Oct 06, 2006 at 04:06:37PM -0600, Chad Leigh -- Shire.Net LLC
wrote:> On Oct 6, 2006, at 3:53 PM, Nicolas Williams wrote:
> >On Fri, Oct 06, 2006 at 03:30:20PM -0600, Chad Leigh -- Shire.Net  
> >LLC wrote:
> >>On Oct 6, 2006, at 3:08 PM, Erik Trimble wrote:
> >>>OK. So, now we''re on to FV.  As Nico pointed out, FV
is going to
> >>>need a new API.  Using the VMS convention of simply creating
file
> >>>names with a version string afterwards is unacceptible, as it
> >>>creates enormous directory pollution,
> >>
> >>Assumption, not supported.  "Eye of the  beholder."
> >
> >No, you really need an API, otherwise you have to guess when to  
> >snapshot
> >versions of files.
> 
> What does "snapshot versions of files" mean?
The act of creating file versions ala VMS.
> My line "Assumption, not supported.  "Eye of the
beholder"" was in
> reference to "enormous directory polution"
Ah.  ''Twasn''t clear.
> >
> >>>not to mention user confusion.
> >>
> >>Assumption, not supported.
> >
> >Maybe Erik would find it confusing.  I know I would find it  
> >_annoying_.
> 
> Then leave it set to 1 version
Per-directory?  Per-filesystem?
> >
> >>>So, FV has to be invisible to non-aware programs.
> >>
> >>yes
> >
> >Interesting that you agree with this when you disagree with
Erik''s
> >other
> >points!  To me this statement implies FV APIs.
> 
> It has to do with the implementation details.  I don''t know what
sort
> of APIs you are saying are  needed.  Maybe they are needed and maybe  
> they would be handy. I am not disputing that.
> 
> The above should be simple to do however -- a program does an open of  
> a file name "foo.bar".  ZFS / the file system routine would use
the
> most recent version by default if no version info is given.
How can version information be given without changing the APIs or
putting the version number/string into the file name?

Putting the version number/string into the file name is hard for me to
accept.  It''s what would lead to polluting my directories.

Now, if the default is 1 version (i.e., keep the current version only),
then I might live with it because I''d never change that setting.

But if we don''t encode the version number/string in the file name and
instead enhance APIs and UIs so that by default I can keep N>1 versions
without them polluting my directories, THEN I would set N>1.
> one UI is the command line shell
Indeed!  And command-line tools, like ls(1), find(1), etc...

What I''m saying is that I''d like to be able to keep multiple
versions of
my files without "echo *" or "ls" showing them to me by
default.

I''d like an option for ls(1), find(1) and friends to show file
versions,
and a way to copy (or, rather, un-hide) selected versions files so that
I could now refer to them as usual -- when I do this I don''t care to
see
version numbers in the file name, I just want to give them names.

And, maybe, I''d like a way to write globs that match file versions
(think of extended globboing, as in KSH).

GUIs would, presumably, have  a way show/hide file versions, search for
them, select them, etc...
> >Certainly not with every write(2).
> 
> no
Good.
> >At fsync(2), close(2), open(2) for
> >write/append?
> 
> probably
Which?
> >What if an application deals in multiple files?
> 
> so?
So, file versions aren''t useful unless the application explicitly
decides tells the OS when to make them.

Similarly with applications that keep files open but keep writing
transactions in ways that the OS can''t isolate without input from the
app.  E.g., databases.  fsync(2) helps here, but lots and lots of
fsync(2)s would result in no useful versioning.

Nico
--

David Dyer-Bennet

2006-Oct-07 00:11 UTC

head link

[zfs-discuss] A versioning FS

On 10/6/06, Nicolas Williams <Nicolas.Williams at sun.com>
wrote:> On Fri, Oct 06, 2006 at 04:06:37PM -0600, Chad Leigh -- Shire.Net LLC
wrote:
> > On Oct 6, 2006, at 3:53 PM, Nicolas Williams wrote:
> > >Maybe Erik would find it confusing.  I know I would find it
> > >_annoying_.
> >
> > Then leave it set to 1 version
>
> Per-directory?  Per-filesystem?
Whatever.  What''s the actual issue here?

I don''t recall that on TOPS-20 it was possible to not version.  What
you could do is set your logout.cmd file to purge your space down to
one copy when you logged out.

This worked fine for the users I knew; even on a system that didn''t
have as much as a gigabyte of disk storage total to support a few
dozen software engineers.
> > The above should be simple to do however -- a program does an open of
> > a file name "foo.bar".  ZFS / the file system routine would
use the
> > most recent version by default if no version info is given.
>
> How can version information be given without changing the APIs or
> putting the version number/string into the file name?
The version number is part of the file name in all the examples I know
about.  I''d find it useless without that; it has to be a real part of
the filesystem, usable by everybody, not a special addon accessible
only with one or two dedicated applications.
> Putting the version number/string into the file name is hard for me to
> accept.  It''s what would lead to polluting my directories.
Set your ls default to not show versions.  Isn''t the problem then
solved?  Maybe add that option to the GUI filesystem explorer as well.

In practice, it never was a problem that I noticed, or that other
people noticed.  And remember that this was on slower systems with
smaller screens and often rather slower screen update.

Do you not like the idea based on theory, or did you actually use
TOPS-20 for a while and find the versioning troublesome?
> > one UI is the command line shell
>
> Indeed!  And command-line tools, like ls(1), find(1), etc...
>
> What I''m saying is that I''d like to be able to keep
multiple versions of
> my files without "echo *" or "ls" showing them to me by
default.
And I find that completely unacceptable; useless.  The whole point of
putting versioning in the filesystem is that that makes it accessible
to all programs.
> > >What if an application deals in multiple files?
> >
> > so?
>
> So, file versions aren''t useful unless the application explicitly
> decides tells the OS when to make them.
File versions are created when a file is created.  In the scenario
where, today, an existing file would be overwritten (deleted), instead
the old file is kept and the new file is given the version number +1
of the old file.
> Similarly with applications that keep files open but keep writing
> transactions in ways that the OS can''t isolate without input from
the
> app.  E.g., databases.  fsync(2) helps here, but lots and lots of
> fsync(2)s would result in no useful versioning.
None of those are candidates for file versioning, and a darned good thing, too.
-- 
David Dyer-Bennet, <mailto:dd-b at dd-b.net>,
<http://www.dd-b.net/dd-b/>
RKBA: <http://www.dd-b.net/carry/>
Pics: <http://www.dd-b.net/dd-b/SnapshotAlbum/>
Dragaera/Steven Brust: <http://dragaera.info/>

Joseph Mocker

2006-Oct-07 01:17 UTC

head link

[zfs-discuss] A versioning FS

Nicolas Williams wrote:
>On Fri, Oct 06, 2006 at 03:30:20PM -0600, Chad Leigh -- Shire.Net LLC wrote:
>  
>
>>On Oct 6, 2006, at 3:08 PM, Erik Trimble wrote:
>>    
>>
>>>OK. So, now we''re on to FV.  As Nico pointed out, FV is
going to
>>>need a new API.  Using the VMS convention of simply creating file  
>>>names with a version string afterwards is unacceptible, as it  
>>>creates enormous directory pollution,
>>>      
>>>
>>Assumption, not supported.  "Eye of the  beholder."
>>    
>>
>
>No, you really need an API, otherwise you have to guess when to snapshot
>versions of files.
>  
>David Dyer-Bennet''s post gives a hint of how this could be done without
any API. Simply augment a few system calls like open(), unlink(), etc. 
Calls that can potentially change files. Since you can''t change a file 
unless is open()''ed with various write flags like O_WRONLY, O_RDWR,
etc,
this could be an ideal place to create the version.

One could probably write a "poor man''s" FV LD_PRELOAD library
to do this
without the filesystem''s knowledge at all.

It wouldn''t be as efficient with space as could be done at the 
filesystem level, but as someone said, disk is cheap.

Joseph Mocker

2006-Oct-07 01:22 UTC

head link

[zfs-discuss] A versioning FS

Nicolas Williams wrote:
>
>The big question though is: how to snapshot file versions when they are
>touched/created by applications that are not aware of FV?
>
>Certainly not with every write(2).  At fsync(2), close(2), open(2) for
>write/append?  What if an application deals in multiple files?  Etc...
>
>Automatically capturing file versions isn''t possible in the general
case
>with applications that aren''t aware of FV.
>  
>Don''t snapshots have the same problem. A snapshot could potentially be 
taken when a file is partially written or updated, no?

For example, I start to write a large file, zfs''s buffers fill up and
it
flushes them to disk during the middle of the file I''m writing. If a 
snapshot came along at about the same time, the file would be 
incomplete/corrupt, no?

Erik Trimble

2006-Oct-07 01:33 UTC

head link

[zfs-discuss] A versioning FS

David Dyer-Bennet wrote:> On 10/6/06, Nicolas Williams <Nicolas.Williams at sun.com> wrote:
>
>> > >Maybe Erik would find it confusing.  I know I would find it
>> > >_annoying_.
>> >
>> > Then leave it set to 1 version
>>
>> Per-directory?  Per-filesystem?
>
> Whatever.  What''s the actual issue here?
>
> I don''t recall that on TOPS-20 it was possible to not version. 
What
> you could do is set your logout.cmd file to purge your space down to
> one copy when you logged out.But see, that assumes you have a logout-type functionality to use. Which 
indeed is possible for command-line usage, but then only in a very 
limited way.   During a typical session, I access almost 20 NFS-mounted 
directories. And anyone using autofs/automount trees gets even more. 
You''re saying that my logout script has to know about all of them to 
keep things clean?  That''s unrealistic.  And that still
doesn''t solve
the problem of people who use SAMBA or NFS from machines which don''t 
have an interactive shell logout system (i.e. Windows).
> This worked fine for the users I knew; even on a system that
didn''t
> have as much as a gigabyte of disk storage total to support a few
> dozen software engineers.
>The problem is we are comparing apples to oranges in user bases here. 
TOPS-20 systems had a couple of dozen users (or, at most, a few 
hundred).  VMS only slightly more.  UNIX/POSIX systems have 10s of 
thousands.  Plus, the number of files being created under typical modern 
systems is at least two (and probably three or four) orders of magnitude 
greater.  I''ve got 100,000 files under /usr in Solaris, and almost
1,000
under my home directory.  And I don''t have anything significant in my 
/home (no source code, no build/test trees, just misc business stuff).   
What is managable with a few files quickly becomes unwieldy with more 
than a few dozen.

This is what Nico and I are talking about:  if you turn on file 
versioning automatically (even for just a directory, and not a whole 
filesystem), the number of files being created explodes geometrically.
>> > The above should be simple to do however -- a program does an open
of
>> > a file name "foo.bar".  ZFS / the file system routine
would use the
>> > most recent version by default if no version info is given.
>>
>> How can version information be given without changing the APIs or
>> putting the version number/string into the file name?
>
> The version number is part of the file name in all the examples I know
> about.  I''d find it useless without that; it has to be a real part
of
> the filesystem, usable by everybody, not a special addon accessible
> only with one or two dedicated applications.
>
>> Putting the version number/string into the file name is hard for me to
>> accept.  It''s what would lead to polluting my directories.
>
> Set your ls default to not show versions.  Isn''t the problem then
> solved?  Maybe add that option to the GUI filesystem explorer as well.
>But this requires modifying all the relevant apps, which is the same 
amount of work as modifying them to use a new FV API.  It''s not 
transparent to the end-user.
> In practice, it never was a problem that I noticed, or that other
> people noticed.  And remember that this was on slower systems with
> smaller screens and often rather slower screen update.
>
> Do you not like the idea based on theory, or did you actually use
> TOPS-20 for a while and find the versioning troublesome?
>Putting the file version number as part of the file name breaks things. 
Apps unaware of the special significance of this format will tend to 
write similar names, which can screw everything royally. 

Example:

Say we use <file>;<version>

In emacs, I edit FOO:2

it will write out a temp file "FOO:2~".  So, how does the FS deal with
this the next time they need to create a new version?

The problem lies in that under VMS, the '';'' was a special
character, and
unusable in normal naming. I suspect a similar situation exists under 
TOPS-20.  No such luck in a POSIX filesystem - all printable (and many 
unprintable) characters are valid for use in filenames. So you _CAN''T_ 
use them to deliniate File Versioning, without risking blowing the 
entire scheme when some random app decides to either use your FV marker 
for its own needs, or something similar to the emacs case above.


>> > one UI is the command line shell
>>
>> Indeed!  And command-line tools, like ls(1), find(1), etc...
>>
>> What I''m saying is that I''d like to be able to keep
multiple versions of
>> my files without "echo *" or "ls" showing them to
me by default.
>
> And I find that completely unacceptable; useless.  The whole point of
> putting versioning in the filesystem is that that makes it accessible
> to all programs.
>But, because of the explosion in the number of files, you CAN''T 
automatically show all versions. Users will NEVER accept this. The only 
clean way to do this is to show file versions only upon request. Not by 
default.

>> > >What if an application deals in multiple files?
>> >
>> > so?
>>
>> So, file versions aren''t useful unless the application
explicitly
>> decides tells the OS when to make them.
>
> File versions are created when a file is created.  In the scenario
> where, today, an existing file would be overwritten (deleted), instead
> the old file is kept and the new file is given the version number +1
> of the old file.
>
>> Similarly with applications that keep files open but keep writing
>> transactions in ways that the OS can''t isolate without input
from the
>> app.  E.g., databases.  fsync(2) helps here, but lots and lots of
>> fsync(2)s would result in no useful versioning.
>
> None of those are candidates for file versioning, and a darned good 
> thing, too.
Honestly, as far as file versioning goes, the time to make a new version 
is when calling open() with the appropriate arguments to allow for 
append or modification. You obviously don''t want to create a new
version
if you are only opening a file for read-only access, and changing 
version on fsync() is ludicrous, and on close() doesn''t differentiate 
between a file which has been modified or not.

Given this, we''re back into the problem FV is supposed to solve.   It
is
entirely possible for an editor to keep open a file for a long time, 
periodically writing out your changes without issuing a new open().  
Word with auto-save turned off is a prime example.   Given this, you''ve
only created a new version when you first load the document, and all 
your intermediary changes are lost, since it only saves the document on 
close().   Thus, in order to get benefits from FV, your editor must 
issue periodic close() and open() commands on the same file, as you 
edit, all without your intervention.  Exactly how many editors do this?  
I have no idea.  So, the only way to enable FV is to require the user to 
periodically push the "Save" button. Which is how much more different 
than the current situation?

-Erik

Chad Leigh -- Shire.Net LLC

2006-Oct-07 01:37 UTC

head link

[zfs-discuss] A versioning FS

On Oct 6, 2006, at 7:33 PM, Erik Trimble wrote:
>
> This is what Nico and I are talking about:  if you turn on file  
> versioning automatically (even for just a directory, and not a  
> whole filesystem), the number of files being created explodes  
> geometrically.
But it doesn''t.  Unless you are editing geometrically more files.

Chad

---
Chad Leigh -- Shire.Net LLC
Your Web App and Email hosting provider
chad at shire.net



-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2411 bytes
Desc: not available
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20061006/43cb536f/attachment.bin>

Erik Trimble

2006-Oct-07 01:40 UTC

head link

[zfs-discuss] A versioning FS

Joseph Mocker wrote:> Nicolas Williams wrote:
>
>>
>> The big question though is: how to snapshot file versions when they are
>> touched/created by applications that are not aware of FV?
>>
>> Certainly not with every write(2).  At fsync(2), close(2), open(2) for
>> write/append?  What if an application deals in multiple files?  Etc...
>>
>> Automatically capturing file versions isn''t possible in the
general case
>> with applications that aren''t aware of FV.
>>  
>>
> Don''t snapshots have the same problem. A snapshot could
potentially be
> taken when a file is partially written or updated, no?
>
> For example, I start to write a large file, zfs''s buffers fill up
and
> it flushes them to disk during the middle of the file I''m writing.
If
> a snapshot came along at about the same time, the file would be 
> incomplete/corrupt, no?
>The developers can answer this definitively, but I believe the answer to 
your questions is NO.  That is, if there is anything in the buffer 
waiting to be written when a snapshot request comes along, the buffer is 
written out so that the file is consistent with the last write().  So, 
snapshotting should NEVER cause a file corruption in this matter. That 
said, if you are doing the following:

1. App issues write() for data A
2. snapshot request
3. App issues write for data B

Then yes, the snapshot file will only contain data A, and not data B, 
which might lead to an inconsistency in the app''s behavior, if both A 
and B were important to be written together.  But if that were the case, 
then the app should have written A and B atomically.

So, if you are writing to a file, it works better to write everything at 
once in a stream, rather than a character (or byte) at a time. :-)

-Erik

Joseph Mocker

2006-Oct-07 01:51 UTC

head link

[zfs-discuss] A versioning FS

Erik Trimble wrote:
>
> The developers can answer this definitively, but I believe the answer 
> to your questions is NO.  That is, if there is anything in the buffer 
> waiting to be written when a snapshot request comes along, the buffer 
> is written out so that the file is consistent with the last write().  
> So, snapshotting should NEVER cause a file corruption in this matter. 
> That said, if you are doing the following:
>
> 1. App issues write() for data A
> 2. snapshot request
> 3. App issues write for data B
>
> Then yes, the snapshot file will only contain data A, and not data B, 
> which might lead to an inconsistency in the app''s behavior, if
both A
> and B were important to be written together.
Yes, this is what I was talking about.
> But if that were the case, then the app should have written A and B 
> atomically.
>And how realistic is that? You are suggesting, for example, that every
application that writes an XML file should buffer the _entire_ XML
stream in memory and issue a single atomic write of that entire
document.  That''s not realistic, and in some cases not even possible.

Otherwise, uh, we better fix sed then :-)

--joe

Chad Leigh -- Shire.Net LLC

2006-Oct-07 02:12 UTC

head link

[zfs-discuss] A versioning FS

On Oct 6, 2006, at 7:33 PM, Erik Trimble wrote:
> David Dyer-Bennet wrote:
>> On 10/6/06, Nicolas Williams <Nicolas.Williams at sun.com> wrote:
>>
>>> > >Maybe Erik would find it confusing.  I know I would find
it
>>> > >_annoying_.
>>> >
>>> > Then leave it set to 1 version
>>>
>>> Per-directory?  Per-filesystem?
>>
>> Whatever.  What''s the actual issue here?
>>
>> I don''t recall that on TOPS-20 it was possible to not version.
What
>> you could do is set your logout.cmd file to purge your space down to
>> one copy when you logged out.
> But see, that assumes you have a logout-type functionality to use.  
> Which indeed is possible for command-line usage, but then only in a  
> very limited way.   During a typical session, I access almost 20  
> NFS-mounted directories. And anyone using autofs/automount trees  
> gets even more. You''re saying that my logout script has to know  
> about all of them to keep things clean?  That''s unrealistic.
It is up to you to come up with a scheme to keep things clean, the  
same way you do now anyway (downloads, etc),
>   And that still doesn''t solve the problem of people who use SAMBA
> or NFS from machines which don''t have an interactive shell logout
> system (i.e. Windows).
It is still mounted on their desktops and they can still delete files  
with FV the same way they do now

No real issue.
>
>> This worked fine for the users I knew; even on a system that
didn''t
>> have as much as a gigabyte of disk storage total to support a few
>> dozen software engineers.
>>
> The problem is we are comparing apples to oranges in user bases  
> here. TOPS-20 systems had a couple of dozen users (or, at most, a  
> few hundred).  VMS only slightly more.  UNIX/POSIX systems have 10s  
> of thousands.
Rarely.  Most of them have in the same range as VMS now or then.
> Plus, the number of files being created under typical modern  
> systems is at least two (and probably three or four) orders of  
> magnitude greater.  I''ve got 100,000 files under /usr in Solaris,
so?  You are not editing these are you?
> and almost 1,000 under my home directory.
again, FV only matters when you edit them
> And I don''t have anything significant in my /home (no source code,
> no build/test trees, just misc business stuff).   What is managable  
> with a few files quickly becomes unwieldy with more than a few dozen.
I think you admitted you had not used FV before.  Is that the case?   
Then how can you speak about what becomes unwieldy?

FV is not any more unwieldy with 1000 files in a dir than with 10.   
Most people are not editing the 1000 files sitting in their directory.
>
> This is what Nico and I are talking about:  if you turn on file  
> versioning automatically (even for just a directory, and not a  
> whole filesystem), the number of files being created explodes  
> geometrically.
Again, it does not.  Files are only versioned when they are edited.
>
>>> > The above should be simple to do however -- a program does an
>>> open of
>>> > a file name "foo.bar".  ZFS / the file system
routine would use
>>> the
>>> > most recent version by default if no version info is given.
>>>
>>> How can version information be given without changing the APIs or
>>> putting the version number/string into the file name?
>>
>> The version number is part of the file name in all the examples I  
>> know
>> about.  I''d find it useless without that; it has to be a real
part of
>> the filesystem, usable by everybody, not a special addon accessible
>> only with one or two dedicated applications.
>>
>>> Putting the version number/string into the file name is hard for  
>>> me to
>>> accept.  It''s what would lead to polluting my directories.
>>
>> Set your ls default to not show versions.  Isn''t the problem
then
>> solved?  Maybe add that option to the GUI filesystem explorer as  
>> well.
>>
> But this requires modifying all the relevant apps, which is the  
> same amount of work as modifying them to use a new FV API.  It''s  
> not transparent to the end-user.
Because the semantics of a file name are different on a unix/posix  
system than they are on a VMS or TOPS-20 system, which had more  
structured filenames.  I would say that the version cannot be an  
actual part of the file name but would have to be meta data.   
However, it could display as part of the username and the underlying  
system can be made to do the right thing

ie,

"foo" gets you the latest "foo"

Specifically entering in  foo;7 gets you version 7 or the latest if  
there are less than 7 versions available.  The app can think of it as  
being part of the file name, but the underlying system would have to  
know how to do the right thing in extracting the version out and  
making it meta data.  Takes some thinking and I am not claiming to  
have all the answers right now, but hardly undoable.

No app changes are necessary.

>
>> In practice, it never was a problem that I noticed, or that other
>> people noticed.  And remember that this was on slower systems with
>> smaller screens and often rather slower screen update.
>>
>> Do you not like the idea based on theory, or did you actually use
>> TOPS-20 for a while and find the versioning troublesome?
>>
> Putting the file version number as part of the file name breaks  
> things. Apps unaware of the special significance of this format  
> will tend to write similar names, which can screw everything royally.
> Example:
>
> Say we use <file>;<version>
>
> In emacs, I edit FOO:2
>
> it will write out a temp file "FOO:2~".  So, how does the FS deal
> with this the next time they need to create a new version?
>
> The problem lies in that under VMS, the '';'' was a special
> character, and unusable in normal naming. I suspect a similar  
> situation exists under TOPS-20.  No such luck in a POSIX filesystem  
> - all printable (and many unprintable) characters are valid for use  
> in filenames. So you _CAN''T_ use them to deliniate File
Versioning,
> without risking blowing the entire scheme when some random app  
> decides to either use your FV marker for its own needs, or  
> something similar to the emacs case above.
Yes, this needs to be thought about but is hardly a show stopper.   
There are most likely many possible solutions that will work for most  
people, and if you make it configurable then those people who run  
into issues can reconfigure it.  Ie, say you do use     
<file>;<version> and that proves unworkable in a specific case, then
the system can be reconfigured to display / decode using a different  
character. Or perhaps, in that case, the user needs to supply a \  
character in front of the ; that exists in a real file to not have it  
decoded as a version identifier.
>
>
>
>>> > one UI is the command line shell
>>>
>>> Indeed!  And command-line tools, like ls(1), find(1), etc...
>>>
>>> What I''m saying is that I''d like to be able to
keep multiple
>>> versions of
>>> my files without "echo *" or "ls" showing them
to me by default.
>>
>> And I find that completely unacceptable; useless.  The whole point of
>> putting versioning in the filesystem is that that makes it accessible
>> to all programs.
>>
> But, because of the explosion in the number of files,
There is no explosion. You have not made any case except your claim  
in mail that such an explosion is real and not just your personal  
fear.  Remember, only files that are edited/changed are FVed.
> you CAN''T automatically show all versions.
Sure you can.
> Users will NEVER accept this.
Have you done usability testing?  There is no explosion of files like  
you claim so most users would probably not object.
> The only clean way to do this is to show file versions only upon  
> request. Not by default.
Your claim.  I claim otherwise.  You''d have to do some real testing  
to see if it really is a problem.
>
>
>>> > >What if an application deals in multiple files?
>>> >
>>> > so?
>>>
>>> So, file versions aren''t useful unless the application
explicitly
>>> decides tells the OS when to make them.
>>
>> File versions are created when a file is created.  In the scenario
>> where, today, an existing file would be overwritten (deleted),  
>> instead
>> the old file is kept and the new file is given the version number +1
>> of the old file.
Exactly
>>
>>> Similarly with applications that keep files open but keep writing
>>> transactions in ways that the OS can''t isolate without
input from
>>> the
>>> app.  E.g., databases.  fsync(2) helps here, but lots and lots of
>>> fsync(2)s would result in no useful versioning.
>>
>> None of those are candidates for file versioning, and a darned  
>> good thing, too.
>
> Honestly, as far as file versioning goes, the time to make a new  
> version is when calling open() with the appropriate arguments to  
> allow for append or modification.
exactly
> You obviously don''t want to create a new version if you are only  
> opening a file for read-only access, and changing version on fsync 
> () is ludicrous,
yes
> and on close() doesn''t differentiate between a file which has been
> modified or not.
ok.  I am not an expert on low level file operations so I don''t know  
what knowledge if around of a file having  been changed or not.

However, I''d have to think back to my VMS dev days -- I think that  
when using the LSE editor, whenever I did a write it did create a new  
version.  I cannot remember for sure.  Would have to find a VMS  
system to test on.

This would have to be thought out some.
>
> Given this, we''re back into the problem FV is supposed to solve.
> It is entirely possible for an editor to keep open a file for a  
> long time, periodically writing out your changes without issuing a  
> new open().  Word with auto-save turned off is a prime example.
ok
> Given this, you''ve only created a new version when you first load
> the document, and all your intermediary changes are lost, since it  
> only saves the document on close().
ok.  FV is not a panacea to all problems.  But most people do not sit  
there with a file open forever.  FV solves a lot of problems.  It  
still acts as a checkpoint for file edits -- especially for most  
people''s standard usage of open a file, edit it, close it, go watch  
TV, think of something else, come back in and open again and edit it,  
etc.
>   Thus, in order to get benefits from FV, your editor must issue  
> periodic close() and open() commands on the same file, as you edit,  
> all without your intervention.
No, you get the benefits of FV, just across editing sessions and not  
internal to an editing session.
> Exactly how many editors do this?  I have no idea.  So, the only  
> way to enable FV is to require the user to periodically push the  
> "Save" button. Which is how much more different than the current
> situation?
I edit a file.  I realize I screwed up.  I can go back to the  
previous version (or 2 ago or whatever).  I cannot do that in the  
current situation.



Chad
>
> -Erik
>
>
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
---
Chad Leigh -- Shire.Net LLC
Your Web App and Email hosting provider
chad at shire.net



-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2411 bytes
Desc: not available
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20061006/eaa14b57/attachment.bin>

Anton B. Rang

2006-Oct-07 03:42 UTC

head link

[zfs-discuss] Re: A versioning FS

>I think our problem is that we look at FV from different angles. I look 
>at it from the point of view of people who have NEVER used FV, and you 
>look at it from the view of people who have ALWAYS used FV.
That''s certainly a part of it. It''s interesting reading this
discussion, as someone who used VMS heavily through about the mid-1980s &
then became a UNIX sysadmin. File versioning was one of the items I really
missed. There are a lot of interesting use cases (walk away from terminal, come
back, quit emacs, get prompted whether to save file -- go ahead, save it, use
''diff'' to determine what changed, and then delete the newly
written file version if the changes are unwanted).

Directory pollution really turned out not to be an issue in practice, perhaps
because the default version limit was relatively small (3 in VMS). It could be
set per-file or (IIRC) per-directory. If someone didn''t want versioning
at all, they could just set their directory to use one version, and old versions
simply didn''t exist. Alternatively, to see just the most recent
versions, '';'' would refer to the most recent version, so
''dir ;'' showed only the filenames and no old versions (going
from memory here).

Having a delimiter character really did help. On UNIX, we have
''/'', though POSIX prohibits, or at least highly discourages,
its use.  Mac OS X uses ''/'' to access named forks (aka named
streams, aka extended attributes in the Solaris sense). If I do ''ls
xyz'' I see just xyz.  If I do ''ls xyz/rsrc'' and xyz
has a ''resource fork'' then I see that. No real reason why I
couldn''t do ''ls xyz/versions'', or, preferably,
''ls -V'' :-) and see versions. ''diff xyz
xyz/-1'' would diff xyz with its immediately preceding version;
''diff xyz xyz/2'' with version 2.

(This would be interesting to prototype and do some usability testing.)

I agree that we want a clean interface, that versions should be optional, and
that they should be exposed via the network. (My home directory is NFS-mounted.)

While disk space is not irrelevant, quotas really help in the multi-user
scenario. If someone is close to their limits, they can use
''purge'' (VMS syntax) to remove old versions of their files, or
delete specific versions.

I don''t agree that version control systems solve the same problem as
file versioning. I don''t want to check *every change* that I make into
version control -- it makes the history unwieldy. At the same time, if I make a
change that turns out to work really poorly, I''d like to revert to the
previous code -- not necessary the code which is checked in. (I suspect there
may be some versioning systems which allow intermediate versions to be deleted,
and I just haven''t used them, but this still seems complex compared to
only checking in known-good code.)
 
 
This message posted from opensolaris.org

Anton B. Rang

2006-Oct-07 03:48 UTC

head link

[zfs-discuss] Re: A versioning FS

>People are oriented to their files, not to snapshots.
True, though with NetApp-style snapshots, it''s not that difficult to
translate ''src/file.c'' to
''.snapshot/hourly.0/src/file.c'' and see what it was like an
hour ago. I imagine that a syntax like
''.snapshot/22:20/src/file.c'' would also be easy to use. (On
the other hand, zfs currently requires knowledge of where the file system is
rooted, and knowledge of where the current directory is within that filesystem,
which IMHO is somewhat confusing to users and requires far too much typing.)

I don''t have an answer to your question about how to find an earlier
version of a file with snapshots, though given an intelligent file system,
there''s no reason why we couldn''t have a
''.version'' pseudodirectory (or the like) which understood file
changes and was virtually populated by analyzing differences in the snapshots.
 
 
This message posted from opensolaris.org

Anton B. Rang

2006-Oct-07 03:50 UTC

head link

[zfs-discuss] Re: A versioning FS

>Versioning cannot be automated; taking periodic snapshots != capturing
application state.
But I think we have existence proofs of operating systems which do automate
versioning.

It''s true that capturing a new version each time a file has been
modified and closed may not be perfect, but if it works for 99% of user cases,
that''s good for almost everyone. We have a lot of 99% tools (even
''ls'' is pretty useless in a ten-million-file directory). If we
introduce a new API, users won''t see the benefits because nobody is
going to update all of vi, vim, emacs, rsync, ftp, sed, cat, cp ....
 
 
This message posted from opensolaris.org

Jonathan Edwards

2006-Oct-07 04:08 UTC

head link

[zfs-discuss] A versioning FS

On Oct 6, 2006, at 21:17, Joseph Mocker wrote:
> Nicolas Williams wrote:
>
>> On Fri, Oct 06, 2006 at 03:30:20PM -0600, Chad Leigh -- Shire.Net  
>> LLC wrote:
>>
>>> On Oct 6, 2006, at 3:08 PM, Erik Trimble wrote:
>>>
>>>> OK. So, now we''re on to FV.  As Nico pointed out, FV
is going
>>>> to  need a new API.  Using the VMS convention of simply
creating
>>>> file  names with a version string afterwards is unacceptible,
as
>>>> it  creates enormous directory pollution,
>>>>
>>> Assumption, not supported.  "Eye of the  beholder."
>>>
>>
>> No, you really need an API, otherwise you have to guess when to  
>> snapshot
>> versions of files.
>>
> David Dyer-Bennet''s post gives a hint of how this could be done  
> without any API. Simply augment a few system calls like open(),  
> unlink(), etc. Calls that can potentially change files. Since you  
> can''t change a file unless is open()''ed with various
write flags
> like O_WRONLY, O_RDWR, etc, this could be an ideal place to create  
> the version.
>
> One could probably write a "poor man''s" FV LD_PRELOAD
library to do
> this without the filesystem''s knowledge at all.
With the stackable approach, versionfs does this with compression and  
a number of other configurable policies see:
http://filesystems.org/project-versionfs.html
> It wouldn''t be as efficient with space as could be done at the  
> filesystem level, but as someone said, disk is cheap.
true, but it''s still finite - there''s typically a notion of
recycling
or cleaning that is introduced such as in elephant:
http://www.hpl.hp.com/personal/Alistair_Veitch/papers/elephant-hotos/ 
index.html

or certain versioning implementations that have been written around  
SAM-FS and it''s recycler policies using the archiver.log:
http://www.hmk-computer.com/docs/products/synstar_restoreme.htm

.je

Richard Elling - PAE

2006-Oct-07 04:18 UTC

head link

[zfs-discuss] A versioning FS

Erik Trimble wrote:> The problem is we are comparing apples to oranges in user bases here. 
> TOPS-20 systems had a couple of dozen users (or, at most, a few 
> hundred).  VMS only slightly more.  UNIX/POSIX systems have 10s of 
> thousands.  
IIRC, I had about a dozen files under VMS, not counting versions.
>             Plus, the number of files being created under typical modern 
> systems is at least two (and probably three or four) orders of magnitude 
> greater.  I''ve got 100,000 files under /usr in Solaris, and almost
1,000
> under my home directory.  
wimp :-)  I count 88,148 in my main home directory.  I''ll bet just
running gnome and firefox will get you in the ballpark of 1,000 :-/
  -- richard

Chad Leigh -- Shire.Net LLC

2006-Oct-07 04:20 UTC

head link

[zfs-discuss] A versioning FS

On Oct 6, 2006, at 10:18 PM, Richard Elling - PAE wrote:
> Erik Trimble wrote:
>> The problem is we are comparing apples to oranges in user bases  
>> here. TOPS-20 systems had a couple of dozen users (or, at most, a  
>> few hundred).  VMS only slightly more.  UNIX/POSIX systems have  
>> 10s of thousands.
>
> IIRC, I had about a dozen files under VMS, not counting versions.
You mean in your system?  There was a lot more than that...
>
>>             Plus, the number of files being created under typical  
>> modern systems is at least two (and probably three or four) orders  
>> of magnitude greater.  I''ve got 100,000 files under /usr in  
>> Solaris, and almost 1,000 under my home directory.
>
> wimp :-)  I count 88,148 in my main home directory.  I''ll bet just
> running gnome and firefox will get you in the ballpark of 1,000 :-/
None (well, maybe 1 or 2)  of which you edit and hence would not  
generate versions.

Chad

---
Chad Leigh -- Shire.Net LLC
Your Web App and Email hosting provider
chad at shire.net



-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2411 bytes
Desc: not available
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20061006/1c81cd6c/attachment.bin>

Jonathan Edwards

2006-Oct-07 04:33 UTC

head link

[zfs-discuss] Re: A versioning FS

On Oct 6, 2006, at 23:42, Anton B. Rang wrote:
> I don''t agree that version control systems solve the same problem
> as file versioning. I don''t want to check *every change* that I  
> make into version control -- it makes the history unwieldy. At the  
> same time, if I make a change that turns out to work really poorly,  
> I''d like to revert to the previous code -- not necessary the code
> which is checked in. (I suspect there may be some versioning  
> systems which allow intermediate versions to be deleted, and I just  
> haven''t used them, but this still seems complex compared to only  
> checking in known-good code.)
The use cases are somewhat different here.  I would venture to say  
that a *personal* file versioning system needs to be thought of  
differently from a *group* co-ordination formal version control  
system.  Of course there is a fair amount of overlap in both use  
cases particularly when you consider a global namespace and  
concurrent access problems as you can see in the cedar or plan9  
systems (fossil/venti):
http://portal.acm.org/citation.cfm?doid=42392.42398
http://cm.bell-labs.com/plan9/
And if we were to also consider dynamic linking and versioning for  
depracated functions, there''s another whole level of parallel  
backwards compatibility interface problems that are become much  
easier to approach.

While this is an FV discussion, I do believe that we need some sort  
of clearer distinction between FV, VC, DR, CDP, and Snapshotting  
structured around the usability cases and close/sync vs a forced  
version mark/branch .. there''s too much confusion in this space often  
with conflicting goals misapplied to often solve similar problems.

.je
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20061007/5a1c7b2a/attachment.html>

Ben Gollmer

2006-Oct-07 06:49 UTC

head link

[zfs-discuss] (OT: SVN branches) A versioning FS

On Oct 6, 2006, at 12:18 PM, David Dyer-Bennet wrote:> On 10/5/06, Wee Yeh Tan <weeyeh at gmail.com> wrote:
>> On 10/6/06, David Dyer-Bennet <dd-b at dd-b.net> wrote:
>> > One of the big problems with CVS and SVN and Microsoft  
>> SourceSafe is
>> > that you don''t have the benefits of version control most
of the
>> time,
>> > because all commits are *public*.
>>
>> David,
>>
>> That is exactly what "branch" is for in CVS and SVN.  Dunno
much
>> about
>> M$ SourceSafe.
>
> I''ve never encountered branch being used that way, anywhere. 
It''s
> used for things like developing release 2.0 while still supporting 1.5
> and 1.6.
>
> However, especially with merge in svn it might be feasible to use a
> branch that way.  What''s the operation to update the branch from
the
> trunk in that scenario?
We use personal branches all the time; in fact each developer has at  
least one, sometimes several if they are working on orthogonal issues  
or experimenting with a couple of different approaches to the same  
problem. Personal branches are for messy code, unfinished patches -  
basically anything that took longer than 15 minutes to write. Keeping  
that stuff on just one machine is unworkable as I code from many  
locations, not to mention the server is backed up more often.

Note that when I say ''personal'', I mean intended for the use
of one
particular person. Some people refer to these as ''private''
branches,
but we don''t do access control in svn other than on a per-project  
level, so other users can take a look at what I''m up to. This allows  
me to ask for suggestions or advice without having to email diffs  
around.

Updating from trunk is slightly irritating as svn doesn''t do merge  
tracking ATM (it''s in the works, though). Currently I just grep the  
commit log for the last merge from trunk (I use a consistent log  
message so this is easy).

svn log https://svn.example.com/project/branches/ben | grep ''Merged  
from trunk''
(note last merged revision)
svn merge -r$LAST_MERGED_REV:HEAD https://svn.example.com/project/ 
trunk /path/to/wc
(fix any conflicts)
svn ci /path/to/wc -m "Merged from trunk r$LASTMERGEDREV"

Of course, you can also cherry-pick changes from other branches or  
tags if you know the revision number(s).

 From what I''ve seen on the svn mailing lists, this is a pretty  
common pattern to use. I don''t think it''s very common in CVS
though,
simply because branching and merging are more difficult.

-- 
Ben

-------------- next part --------------
A non-text attachment was scrubbed...
Name: PGP.sig
Type: application/pgp-signature
Size: 186 bytes
Desc: This is a digitally signed message part
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20061007/6253fc59/attachment.bin>

Ben Gollmer

2006-Oct-07 06:59 UTC

head link

[zfs-discuss] A versioning FS

On Oct 6, 2006, at 6:15 PM, Nicolas Williams wrote:> What I''m saying is that I''d like to be able to keep
multiple
> versions of
> my files without "echo *" or "ls" showing them to me by
default.
Hmm, what about file.txt -> ._file.txt.1, ._file.txt.2, etc? If you  
don''t like the _ you could use @ or some other character.
> I''d like an option for ls(1), find(1) and friends to show file  
> versions,
> and a way to copy (or, rather, un-hide) selected versions files so  
> that
> I could now refer to them as usual -- when I do this I don''t care
> to see
> version numbers in the file name, I just want to give them names.
ln -s ._file.txt.1 first_published_draft.txt
ln -s ._file.txt.5 second_published_draft.txt
> And, maybe, I''d like a way to write globs that match file versions
> (think of extended globboing, as in KSH).
Hmm, I''m not exactly sure what you mean by this, but using a dotfile  
scheme would allow you to easily glob for the file names.
> Similarly with applications that keep files open but keep writing
> transactions in ways that the OS can''t isolate without input from
the
> app.  E.g., databases.  fsync(2) helps here, but lots and lots of
> fsync(2)s would result in no useful versioning.
Presumably you''d create a different fs for your database, turning the  
versioning property off. You''d be likely to want to adjust other fs  
parameters anyway, judging from some recent posts discussing how to  
get the best database performance.

-- 
Ben


-------------- next part --------------
A non-text attachment was scrubbed...
Name: PGP.sig
Type: application/pgp-signature
Size: 186 bytes
Desc: This is a digitally signed message part
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20061007/1b2b0485/attachment.bin>

Anton B. Rang

2006-Oct-07 08:02 UTC

head link

[zfs-discuss] Re: A versioning FS

> If you disagree, please tell us *why* you think snapshots don''t
solve the problem.
Three reasons.

First of all, unless we have per-file snapshots, there''s no way to keep
old versions of particularly important files without keeping old versions of
everything else. If I have a 4 GB video in my home directory, and my 50 KB file
containing finance data, keeping the last version of the 50 KB file (last edited
two weeks ago) means keeping the 4 GB file around. Forever, if I never make
another change to the 50 KB file.

Second, with any rational implementation of file versioning, the end user has
control over the number of versions kept for a particular file. Generally
snapshots are administratively defined rather than end-user defined, and not at
file granularity.

Third, snapshots are tied to time, not change. A real-life example: One day I
logged into a VAX to test a small program and discovered the Ada compiler
wasn''t working because it was complaining about an error in its
configuration file. It turned out I''d edited that some eight months
earlier (two semesters ago) and made an error which had never been caught as
I''d finished the course involved. I simply deleted the current (bad)
version, reverting to the last good version. Even if I''d had snapshots
going back that far, it would have been painful to find which one (if any) had
the correct version of the file. Similarly, I can edit a script, test it, find
that the change doesn''t work, and go back, all within 150 seconds or
so. The chances that a snapshot would pick up the previous version in this
scenario are low.

Technically, these could be seen as arguing against the current *implementation*
of snapshots. One can envision per-file, user-configurable snapshots. Those
would come close, though the third argument above is still an issue. (I can also
imagine a "snapshot only if modified" command which might help there.)

That said, do file versions fit into UNIX? I think they could be made to, but
they would change existing behavior, which could confuse either users (amply
demonstrated in these threads) or applications.

(For what it''s worth, incidentally, most users don''t use the
command line, believe it or not....)
 
 
This message posted from opensolaris.org

Erik Trimble

2006-Oct-07 10:13 UTC

head link

Snapshots of an active file (was: Re: [zfs-discuss] A versioning FS)

Joseph Mocker wrote:> Erik Trimble wrote:
>
>>
>> The developers can answer this definitively, but I believe the answer 
>> to your questions is NO.  That is, if there is anything in the buffer 
>> waiting to be written when a snapshot request comes along, the buffer 
>> is written out so that the file is consistent with the last write().  
>> So, snapshotting should NEVER cause a file corruption in this matter. 
>> That said, if you are doing the following:
>>
>> 1. App issues write() for data A
>> 2. snapshot request
>> 3. App issues write for data B
>>
>> Then yes, the snapshot file will only contain data A, and not data B, 
>> which might lead to an inconsistency in the app''s behavior, if
both A
>> and B were important to be written together.
>
> Yes, this is what I was talking about.
>
>> But if that were the case, then the app should have written A and B 
>> atomically.
>>
> And how realistic is that? You are suggesting, for example, that every 
> application that writes an XML file should buffer the _entire_ XML 
> stream in memory and issue a single atomic write of that entire 
> document.  That''s not realistic, and in some cases not even
possible.
>
> Otherwise, uh, we better fix sed then :-)
>
> --joe
>
There is no real answer to this problem. If I have to wait for the app 
to issue a close() on a file before taking a copy for the snapshot, then 
I can wait indefinitely, and could potentially NEVER get anything for 
the snapshot.   If I somehow manage to do some magic and set aside a 
copy of the file upon open(), just to put that file into a (possible) 
snapshot, then yes, I get a consistent file, which is out-of-date w/r/t 
the current one (and, kinda wrecks the concept of a snapshot being "what 
is there at time X").  And, as you note, if I wait for pending write() 
to finish before immediately taking a copy, then I run into the problem 
of apps possibly not leaving the file in a consistent state.

No, you won''t ever have file Corruption (i.e. incomplete write() 
screwing the file completely), but you certainly might have file 
Inconsistency, which is an application problem, and not one that can be 
dealt with at the FileSystem level, as it requires a knowledge of what 
makes a "consistent" file, from the app''s standpoint.

This is the exact same problem as backup software has had forever, so it 
is not new to snapshots.  Which is why you want to take backups (and 
snapshots) of a quiet filesystem if at all possible.

-Erik

Erik Trimble

2006-Oct-07 10:14 UTC

head link

[zfs-discuss] A versioning FS

Chad Leigh -- Shire.Net LLC wrote:>>>             Plus, the number of files being created under typical 
>>> modern systems is at least two (and probably three or four) orders 
>>> of magnitude greater.  I''ve got 100,000 files under /usr
in Solaris,
>>> and almost 1,000 under my home directory.
>>
>> wimp :-)  I count 88,148 in my main home directory.  I''ll bet
just
>> running gnome and firefox will get you in the ballpark of 1,000 :-/
>
> None (well, maybe 1 or 2)  of which you edit and hence would not 
> generate versions.
>
> Chad
Richard actually brings up a good point, which answers another question 
Chad had for me:  exactly how many files do I edit?   Which directly 
impacts the "directory pollution" problem I''ve been talking
about.

There are essentially three scenarios:

(a)  FV is turned on on a per-file basis

(b) FV is turned on on a per-directory basis

(c) FV is turned on on a per-filesystem basis

Now, I think we can all see that you get geometic file explosion in case 
(c), as absolutely anything that writes to the filesystem gets 
versioned.  Things like Web Browser caches alone would kill you.

In case (b), there''s quite a bit of explosion, too.  There are lots of 
apps which create, update, and destroy files frequently in various 
directories. Most Office and similar large user apps do this. So it is 
very, very easy to have many versions quickly.  This can be somewhat 
mitigated by NOT turning on FV in directories which are commonly used as 
temp dirs (e.g. ~/tmp)

In case (a), you are down to files you actively tell FV to use, which I 
agree can be quite manageable.  I tend to actively edit a couple of 
dozen files frequently, so that number can be manageable, so long as the 
number of versions is held down to some limit. 

However, in both case (a) and (b) for netFS users, exactly how are they 
supposed to indicate that they want FV turned on?  There is no symantics 
for doing this in any netFS protocol, so we''d have have to have custom 
API/tools for them to run to turn on FV.

Also, something to think about:  under FV, do old versions of a file 
which was deleted (via unlink() or similar) also get deleted?

-Erik

Erik Trimble

2006-Oct-07 10:46 UTC

head link

[zfs-discuss] A versioning FS

Chad Leigh -- Shire.Net LLC wrote:>> But see, that assumes you have a logout-type functionality to use. 
>> Which indeed is possible for command-line usage, but then only in a 
>> very limited way.   During a typical session, I access almost 20 
>> NFS-mounted directories. And anyone using autofs/automount trees gets 
>> even more. You''re saying that my logout script has to know
about all
>> of them to keep things clean?  That''s unrealistic.
> It is up to you to come up with a scheme to keep things clean, the 
> same way you do now anyway (downloads, etc),
>Which is entirely reasonable if the number of places where FV is 
limited, but completely unrealistic if FV is turned on for a large 
number of places.  And much more difficult for those restricted to 
accessing File Versioned directories over a netFS, where scripting 
cleanups can be difficult or highly impractical.
>>   And that still doesn''t solve the problem of people who use
SAMBA or
>> NFS from machines which don''t have an interactive shell logout
system
>> (i.e. Windows).
> It is still mounted on their desktops and they can still delete files 
> with FV the same way they do now
>
> No real issue.Well....   If the versions of everything are kept in the same directory, 
then you are going to have a VERY bad user experience with people using 
GUI file browsers.  Cleaning up multiple versions of the same file name 
is going to be tricky, and you will find people very frequently 
accidentally delete the wrong thing.  More importantly, people are going 
to consider it a big hassle to have to keep things tidy by hand.   If 
the versioning is kept somewhere different than the "current" file 
version, then this mitigates things a bit, but you still don''t want to 
require people to clean this stuff up via a GUI.   And, with Windows, 
asking users to use the command prompt for what is normally a GUI 
operation isn''t acceptable, from a general usability standpoint.
>> This worked fine for the users I knew; even on a system that
didn''t
>> The problem is we are comparing apples to oranges in user bases here. 
>> TOPS-20 systems had a couple of dozen users (or, at most, a few 
>> hundred).  VMS only slightly more.  UNIX/POSIX systems have 10s of 
>> thousands.
> Rarely.  Most of them have in the same range as VMS now or then.Very, Very few VMS systems that I know about had more than a couple 
hundred users.  MIT''s main VMS server had only about 2000, with less 
than half that active. A couple of Fortune 500 companies I''ve worked at
in the 90s had VMS systems, and they had very restricted user bases.  
VMS simply was never used as a general-purpose file server, and if there 
were a fairly large number of users, they were logged in via some custom 
app, and never really used the system in the manner we are discussing here.

On the other hand, virtually all the companies I''ve worked for have had
a UNIX-based file server, with at least a hundred or more UIDs.  And 
with Single Sign-on and LDAP becoming the way to go, even mid-sized 
companies have systems with over a 1000 users.  10,000 active users 
isn''t hard to come up with at all.  And, given that Enterprises are a 
main target for ZFS, millions of users are entirely within reason.
>> But this requires modifying all the relevant apps, which is the same 
>> amount of work as modifying them to use a new FV API.  It''s
not
>> transparent to the end-user.
>
> Because the semantics of a file name are different on a unix/posix 
> system than they are on a VMS or TOPS-20 system, which had more 
> structured filenames.  I would say that the version cannot be an 
> actual part of the file name but would have to be meta data.  However, 
> it could display as part of the username and the underlying system can 
> be made to do the right thing
>
> ie,
>
> "foo" gets you the latest "foo"
>
> Specifically entering in  foo;7 gets you version 7 or the latest if 
> there are less than 7 versions available.  The app can think of it as 
> being part of the file name, but the underlying system would have to 
> know how to do the right thing in extracting the version out and 
> making it meta data.  Takes some thinking and I am not claiming to 
> have all the answers right now, but hardly undoable.
>
> No app changes are necessary.
No, this is untrue.  Remember that you can''t use any character to 
indicate FV, as all characters are valid POSIX file names. (well, except 
''/'').     You CAN''T say "foo;8" gives me
version 8 of the file "foo",
because there very well might be a completely different file name 
"foo;8" that is NOT any version of the file foo.  VMS and TOPS had 
reserved characters for file versioning, and thus you were set. This 
isn''t true in UNIX filesystems.

The only way to do FV in the POSIX concept is to either keep the file 
versions in a separate file tree than the "current" files, or to use 
some sort of an API to access them, and otherwise keep them normally 
hidden from view.

You can''t dodge this by simply saying "oh, well, then change the
FV
delimiter if it causes you problems".  Aside from the fact that you are 
breaking POSIX compatibility by reserving some character for special 
use, how confusing would it be to users if the FV delimiter is ";" in 
this directory, "&" in that directory, "_" in the one
over there, etc.
?  That''s entirely possible, given the demands of many Windows apps for
file naming.



-Erik

Joerg Schilling

2006-Oct-07 10:50 UTC

head link

[zfs-discuss] A versioning FS

"Jeremy Teo" <white.wristband at gmail.com> wrote:
> A couple of use cases I was considering off hand:
>
> 1. Oops i truncated my file
> 2. Oops i saved over my file
> 3. Oops an app corrupted my file.
> 4. Oops i rm -rf the wrong directory.
> All of which can be solved by periodic snapshots, but versioning gives
> us immediacy.
I am sure that the same people who accitental type rm -rf * 
would type rm -rf *\;*

And note that this feature would cause a need to change a lot 
of utilities including all shells (see path name expansion).

J?rg

-- 
 EMail:joerg at schily.isdn.cs.tu-berlin.de (home) J?rg Schilling D-13353 Berlin
       js at cs.tu-berlin.de                (uni)  
       schilling at fokus.fraunhofer.de     (work) Blog:
http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily

Joerg Schilling

2006-Oct-07 11:13 UTC

head link

[zfs-discuss] A versioning FS

Nicolas Williams <Nicolas.Williams at sun.com> wrote:
> On Fri, Oct 06, 2006 at 12:02:16PM -0700, Matthew Ahrens wrote:
> > In my opinion, the marginal benefit of per-write(2) versions over 
> > snapshots (which can be per-transaction, ie. every ~5 seconds) does
not
> > outweigh the complexity of implementation and use/administration.
>
> Per-write(2) versions would be worse than useless in many, if not most
> cases.  Even per-close(2) versions wouldn''t always be useful.
Even if there is a proper way to find the right time for a micro snapshot,
if the versions live in the standard namespace of the filesystem, it would
cause POSIX compatibility problems and we would need to change many programs.

J?rg

-- 
 EMail:joerg at schily.isdn.cs.tu-berlin.de (home) J?rg Schilling D-13353 Berlin
       js at cs.tu-berlin.de                (uni)  
       schilling at fokus.fraunhofer.de     (work) Blog:
http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily

Joerg Schilling

2006-Oct-07 11:28 UTC

head link

[zfs-discuss] A versioning FS

"David Dyer-Bennet" <dd-b at dd-b.net> wrote:
> On 10/6/06, Erik Trimble <Erik.Trimble at sun.com> wrote:
> > First of all, let''s agree that this discussion of File
Versioning makes
> > no more reference to its usage as Version Control.  That is, we
aren''t
> > going to talk about it being useful for source code, other than in the
> > context where a source code file is a document, like any other text
> > document.  File Versioning and Version Control are separate things,
with
> > different purposes and feature sets.
>
> Hmm; the most important uses of file versioning come, in my opinion,
> when working on source code.  But for handling very different
> situations than source control does.
>
> > OK. So, now we''re on to FV.  As Nico pointed out, FV is going
to need a
> > new API.  Using the VMS convention of simply creating file names with
a
> > version string afterwards is unacceptible, as it creates enormous
> > directory pollution, not to mention user confusion.  So, FV has to be
> > invisible to non-aware programs.
>
> Strongly disagree, twice.
>
> Having FV invisible to programs not updated to specially support it is
> IMHO unacceptable, and would render the feature useless.
Making it visible to programs causes many problems with OSIX compatibility and
will enforce to change many programs.

J?rg

-- 
 EMail:joerg at schily.isdn.cs.tu-berlin.de (home) J?rg Schilling D-13353 Berlin
       js at cs.tu-berlin.de                (uni)  
       schilling at fokus.fraunhofer.de     (work) Blog:
http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily

Joerg Schilling

2006-Oct-07 11:43 UTC

head link

[zfs-discuss] A versioning FS

Erik Trimble <Erik.Trimble at Sun.COM> wrote:
>
> In order for an FV implementation to be useful for this stated purpose, 
> it must fulfill the following requirements:
>
> (1)  Clean interface for users.  That is, one must NOT be presented with 
> a complete list of all versions unless explicitly asked for it, and it 
> should be simple to select a version based on some reasonable criteria 
> (date of creation/modification, version number, etc.)
>
> (2)  Simple way to decide if a file should be versioned or not. Either 
> automatically version all files (or none at all), or provide a mechanism 
> to turn FV on/off on a per-file or per-directory basis.
>
> (3)  Network-FS awareness.  Without this, FV is severely limited. Given 
> my preconditions above (that is, the current usage pattern of us in the 
> non-FS world), limiting FV to those on the local system restricts its 
> usefulness to the point where it isn''t worth the effort.
The only idea I get thast matches this criteria is to have the versions
in the extended attribute name space.

J?rg

-- 
 EMail:joerg at schily.isdn.cs.tu-berlin.de (home) J?rg Schilling D-13353 Berlin
       js at cs.tu-berlin.de                (uni)  
       schilling at fokus.fraunhofer.de     (work) Blog:
http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily

Joerg Schilling

2006-Oct-07 12:14 UTC

head link

[zfs-discuss] Re: A versioning FS

"Anton B. Rang" <Anton.Rang at Sun.COM> wrote:
> >People are oriented to their files, not to snapshots.
>
> True, though with NetApp-style snapshots, it''s not that difficult
to translate ''src/file.c'' to
''.snapshot/hourly.0/src/file.c'' and see what it was like an
hour ago. I imagine that a syntax like
''.snapshot/22:20/src/file.c'' would also be easy to use. (On
the other hand, zfs currently requires knowledge of where the file system is
rooted, and knowledge of where the current directory is within that filesystem,
which IMHO is somewhat confusing to users and requires far too much typing.)
>
AFAIR, netapp has problems caused by the fact that the inode numbers 
for the snapshots reside on the same fs.

J?rg

-- 
 EMail:joerg at schily.isdn.cs.tu-berlin.de (home) J?rg Schilling D-13353 Berlin
       js at cs.tu-berlin.de                (uni)  
       schilling at fokus.fraunhofer.de     (work) Blog:
http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily

A. C. Censi

2006-Oct-07 17:19 UTC

head link

[zfs-discuss] Re: A versioning FS

It seems that Windows 2003 (and VIsta will too), supports file
versioning. I am not familiar with the implementation. AFAIR it is
using the "alternate data stream" builtin in NTFS, to work with the
versions and hide the versions from the user.

Certainly in Vista they will have to handle at least the question of
the GUI user interface. The ADS feature already has some way of being
acessed from the user interface.

-- 
A. C. Censi
accensi [em] gmail [ponto] com
accensi [em] montreal [ponto] com [ponto] br
accensi [em] gmail [ponto] com - Google Talk

Joerg Schilling

2006-Oct-07 17:21 UTC

head link

[zfs-discuss] Re: A versioning FS

"A. C. Censi" <accensi at gmail.com> wrote:
> It seems that Windows 2003 (and VIsta will too), supports file
> versioning. I am not familiar with the implementation. AFAIR it is
> using the "alternate data stream" builtin in NTFS, to work with
the
> versions and hide the versions from the user.
>
This looks like a simplified version of Sun Extended Attributes.

J?rg

-- 
 EMail:joerg at schily.isdn.cs.tu-berlin.de (home) J?rg Schilling D-13353 Berlin
       js at cs.tu-berlin.de                (uni)  
       schilling at fokus.fraunhofer.de     (work) Blog:
http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily

A. C. Censi

2006-Oct-07 18:06 UTC

head link

[zfs-discuss] Re: A versioning FS

Just to put the references I read in the past about it:
http://www.microsoft.com/technet/windowsvista/library/4ac505e6-dd8b-4ae7-80fa-b9d77cd8104d.mspx

Windows 2003 Derver implementation (for server side copies of client user files)
Working with the Windows Server 2003 Volume Shadow Copy Service
http://www.windowsnetworking.com/articles_tutorials/Windows-Server-2003-Volume-Shadow-Copy-Service.html

Summary:
- named "shadow copy"
- works by NTFS volume (equivalent to *ix filesystem)
- config number of copies or a % of the volume space for copies
- config automatic copies per day
- from the GUI it is accessed by the Properties/Previous Versions tab
(What if the original file is deleted? the oldest is promoted?)

ACC

On 10/7/06, Joerg Schilling <Joerg.Schilling at fokus.fraunhofer.de>
wrote:> "A. C. Censi" <accensi at gmail.com> wrote:
>
> > It seems that Windows 2003 (and VIsta will too), supports file
> > versioning. I am not familiar with the implementation. AFAIR it is
> > using the "alternate data stream" builtin in NTFS, to work
with the
> > versions and hide the versions from the user.
> >
>
> This looks like a simplified version of Sun Extended Attributes.
>
> J?rg
>
> --
>  EMail:joerg at schily.isdn.cs.tu-berlin.de (home) J?rg Schilling D-13353
Berlin
>        js at cs.tu-berlin.de                (uni)
>        schilling at fokus.fraunhofer.de     (work) Blog:
http://schily.blogspot.com/
>  URL:  http://cdrecord.berlios.de/old/private/
ftp://ftp.berlios.de/pub/schily
>

-- 
A. C. Censi
accensi [em] gmail [ponto] com
accensi [em] montreal [ponto] com [ponto] br
accensi [em] gmail [ponto] com - Google Talk

Joseph Mocker

2006-Oct-07 23:36 UTC

head link

[zfs-discuss] Re: Snapshots of an active file

Erik Trimble wrote:
> Joseph Mocker wrote:
>
>> Erik Trimble wrote:
>>
>>>
>>> The developers can answer this definitively, but I believe the 
>>> answer to your questions is NO.  That is, if there is anything in 
>>> the buffer waiting to be written when a snapshot request comes 
>>> along, the buffer is written out so that the file is consistent
with
>>> the last write().  So, snapshotting should NEVER cause a file 
>>> corruption in this matter. That said, if you are doing the
following:
>>>
>>> 1. App issues write() for data A
>>> 2. snapshot request
>>> 3. App issues write for data B
>>>
>>> Then yes, the snapshot file will only contain data A, and not data 
>>> B, which might lead to an inconsistency in the app''s
behavior, if
>>> both A and B were important to be written together.
>>
>>
>> Yes, this is what I was talking about.
>>
>>> But if that were the case, then the app should have written A and B
>>> atomically.
>>>
>> And how realistic is that? You are suggesting, for example, that 
>> every application that writes an XML file should buffer the _entire_ 
>> XML stream in memory and issue a single atomic write of that entire 
>> document.  That''s not realistic, and in some cases not even
possible.
>>
>> Otherwise, uh, we better fix sed then :-)
>>
>> --joe
>>
>
> There is no real answer to this problem. If I have to wait for the app 
> to issue a close() on a file before taking a copy for the snapshot, 
> then I can wait indefinitely, and could potentially NEVER get anything 
> for the snapshot.   If I somehow manage to do some magic and set aside 
> a copy of the file upon open(), just to put that file into a 
> (possible) snapshot, then yes, I get a consistent file, which is 
> out-of-date w/r/t the current one (and, kinda wrecks the concept of a 
> snapshot being "what is there at time X").  And, as you note, if
I
> wait for pending write() to finish before immediately taking a copy, 
> then I run into the problem of apps possibly not leaving the file in a 
> consistent state.
>
> No, you won''t ever have file Corruption (i.e. incomplete write() 
> screwing the file completely), but you certainly might have file 
> Inconsistency, which is an application problem, and not one that can 
> be dealt with at the FileSystem level, as it requires a knowledge of 
> what makes a "consistent" file, from the app''s
standpoint.
>
> This is the exact same problem as backup software has had forever, so 
> it is not new to snapshots.  Which is why you want to take backups 
> (and snapshots) of a quiet filesystem if at all possible.
Which brings me back to the point of file versioning. If an 
implementation were based on something like when a file is open()ed with 
write bits set. There would be no potential for broken files like this.

Also, it would seem that your statement about snapshots as being "what 
is there at time X" in a nutshell, describes why snapshots are different 
than file versioning.  File versioning is not temporal in the same way.


  --joe

Torrey McMahon

2006-Oct-08 02:33 UTC

head link

[zfs-discuss] Re: Snapshots of an active file

Joseph Mocker wrote:>
> Which brings me back to the point of file versioning. If an 
> implementation were based on something like when a file is open()ed 
> with write bits set. There would be no potential for broken files like 
> this. 

I''m showing my lack of knowledge on this one but I thought SAM-FS could
do something like this. Anyone know for sure?

Of course this doesn''t help for apps that keep files open all the time.

Siegfried Nikolaivich

2006-Oct-08 06:03 UTC

head link

[zfs-discuss] Re: A versioning FS

> So, if I build it, people will want it? ;)
I think implementing this feature would help Apple adopt ZFS for Time Machine,
which is essentially a versioning FS in practice.  Actually I don''t
know if Apple does this, but you can increment versions with kernel
notifications of file changes (Spotlight).


Cheers
 
 
This message posted from opensolaris.org

Ian Collins

2006-Oct-08 09:52 UTC

head link

[zfs-discuss] A versioning FS

David Dyer-Bennet wrote:
>
> Actually, "save early and often" is exactly why versioning is
> important.  If you discover you''ve gone down a blind alley in some
> code, it makes it easy to get back to the earlier spots.  This, in my
> experience, happens at a detail level where you won''t (in fact
can''t)
> be doing checkins to version control.
Isn''t that what your editor''s undo command is for?

Ian

Anton B. Rang

2006-Oct-08 16:38 UTC

head link

[zfs-discuss] Re: Re: Snapshots of an active file

> I''m showing my lack of knowledge on this one but I thought SAM-FS
could
> do something like this. Anyone know for sure?
It''s not quite the same, and not out-of-the-box.

SAM-FS has the ability to create an archive copy of files onto disk or tape when
the files are closed after having been modified. These copies may not be made
immediately; their timing depends on rules set by the system administrator.
Hence they are not ?instant? versions.

More importantly, there is (currently) no easy way to retrieve an old version.
When the archiver makes a copy, the location of the copy is logged. It is
possible to use this log to retrieve an older copy even after the file has been
overwritten, and there are several third parties who have written software which
enables this.

Things get trickier when tape recycling comes into play, since the recycler does
not know about the desire to keep old versions. At sites which require
recycling, the positions of old versions have to be logged into a new file
system; or the recycler has to be replaced with a variant which understands
versioning. Third parties have done both of these in the past, but we (Sun)
don?t currently ship this.

There are plans to add a more robust versioning feature to SAM-FS but I don?t
believe there is a definite date or release attached yet.
 
 
This message posted from opensolaris.org

Nicolas Williams

2006-Oct-08 17:13 UTC

head link

[zfs-discuss] A versioning FS

On Fri, Oct 06, 2006 at 06:17:12PM -0700, Joseph Mocker
wrote:> David Dyer-Bennet''s post gives a hint of how this could be done
without
> any API. Simply augment a few system calls like open(), unlink(), etc. 
> Calls that can potentially change files. Since you can''t change a
file
> unless is open()''ed with various write flags like O_WRONLY,
O_RDWR, etc,
> this could be an ideal place to create the version.
I wrote about the same thing.  These are but heuristics.
> One could probably write a "poor man''s" FV LD_PRELOAD
library to do this
> without the filesystem''s knowledge at all.
Indeed, and I believe it has been done before (search freshmeat.net).
> It wouldn''t be as efficient with space as could be done at the 
> filesystem level, but as someone said, disk is cheap.
Suppose ZFS gave you a primitive for efficiently "snapshotting"
individual files, rather than entire filesystems.  That''s the primitive
you''d need to implement this LD_PRELOADable object space efficiently.

Nico
--

Nicolas Williams

2006-Oct-08 17:15 UTC

head link

[zfs-discuss] A versioning FS

On Fri, Oct 06, 2006 at 06:22:01PM -0700, Joseph Mocker
wrote:> Nicolas Williams wrote:
> >Automatically capturing file versions isn''t possible in the
general case
> >with applications that aren''t aware of FV.
> >
> Don''t snapshots have the same problem. A snapshot could
potentially be
> taken when a file is partially written or updated, no?
And backups in general.

Nico
--

Erik Trimble

2006-Oct-08 22:38 UTC

head link

[zfs-discuss] Re: Snapshots of an active file

Joseph Mocker wrote:> Which brings me back to the point of file versioning. If an 
> implementation were based on something like when a file is open()ed 
> with write bits set. There would be no potential for broken files like 
> this.
>
> Also, it would seem that your statement about snapshots as being "what
> is there at time X" in a nutshell, describes why snapshots are 
> different than file versioning.  File versioning is not temporal in 
> the same way.
>That is correct. File Versioning is primarily User-Driven (that is, 
executed and completed at the End-User''s command), if you implement it 
using open() and close() as the drivers. (which, seems to be the 
consensus, is the sane way to do FV).  So, in theory, FV should never 
result in any file Inconsistency or Corruption.  Snapshots are 
essentially System-driven, and as such, about a Point-in-Time for the 
System, not a Point-in-State of an App, which FV centers on.

Snapshots and Backup definitely can result in Inconsistency, as the 
don''t tend to communicate with the app holding a file open. Backups
have
mitigated this problem with certain apps which tend to hold files open 
for extended time (primarily DBs), by allowing the Backup program to 
talk to the app, and have the app write a consistent state to disk 
before the Backup program run.

Snapshots are indeed a different beast than FV, in both subtle and 
not-so-subtle ways.  They fit much more into the class of Backup.  
Honestly, I''ve thought that through this FV thread, we should never 
reference snapshots for functionality, as they really aren''t
comparable.
Apples to Oranges and all.

-Erik

Erik Trimble

2006-Oct-08 22:47 UTC

head link

[zfs-discuss] A versioning FS

Joerg Schilling wrote:> Erik Trimble <Erik.Trimble at Sun.COM> wrote:
>
>   
>> In order for an FV implementation to be useful for this stated purpose,
>> it must fulfill the following requirements:
>>
>> (1)  Clean interface for users.  That is, one must NOT be presented
with
>> a complete list of all versions unless explicitly asked for it, and it 
>> should be simple to select a version based on some reasonable criteria 
>> (date of creation/modification, version number, etc.)
>>
>> (2)  Simple way to decide if a file should be versioned or not. Either 
>> automatically version all files (or none at all), or provide a
mechanism
>> to turn FV on/off on a per-file or per-directory basis.
>>
>> (3)  Network-FS awareness.  Without this, FV is severely limited. Given
>> my preconditions above (that is, the current usage pattern of us in the
>> non-FS world), limiting FV to those on the local system restricts its 
>> usefulness to the point where it isn''t worth the effort.
>>     
>
> The only idea I get thast matches this criteria is to have the versions
> in the extended attribute name space.
>
> J?rg
>
>   Realistically speaking, that''s my conclusion, if we want a nice clean, 
well-designed solution. You need to hide the versioning info in the 
meta-tags, and create a whole new API for accessing/manipulating them.   
This easily solves (1) and (2) above, but (3) is the huge problem, as 
having a new API means you need to change the SMB/NFS protocols to allow 
for client machines to access the new API.  With the new Windows NTFS 
"versioning", we at least have something to hook into for Windows, but
UNIX clients will need to have a whole new suite of tools written, and a 
raft of current apps modified to take advantage of FV.

That said, FV may very well be worth it, and it certainly is worthy of a 
community-driven exploratory implementation.

-Erik

Nicolas Williams

2006-Oct-08 23:23 UTC

head link

[zfs-discuss] A versioning FS

On Sat, Oct 07, 2006 at 01:43:29PM +0200, Joerg Schilling
wrote:> The only idea I get thast matches this criteria is to have the versions
> in the extended attribute name space.
Indeed.  All that''s needed then, CLI UI-wise, beyond what we have now
is
a way to rename versions extended attributes to new file,s or at least
copy them (we have the latter).  And it nicely hides versions.  And it
nicely provides an API for creating them on demand ("magic" extended
attributes), and remote access.

Nico
--

Wee Yeh Tan

2006-Oct-09 01:27 UTC

head link

[zfs-discuss] A versioning FS

On 10/7/06, David Dyer-Bennet <dd-b at dd-b.net>
wrote:> I''ve never encountered branch being used that way, anywhere. 
It''s
> used for things like developing release 2.0 while still supporting 1.5
> and 1.6.
>
> However, especially with merge in svn it might be feasible to use a
> branch that way.  What''s the operation to update the branch from
the
> trunk in that scenario?
You "merge" the changes from the main trunk.


-- 
Just me,
Wire ...

Wee Yeh Tan

2006-Oct-09 01:40 UTC

head link

[zfs-discuss] A versioning FS

On 10/7/06, Ben Gollmer <ben at jatosoft.com>
wrote:> On Oct 6, 2006, at 6:15 PM, Nicolas Williams wrote:
> > What I''m saying is that I''d like to be able to keep
multiple
> > versions of
> > my files without "echo *" or "ls" showing them to
me by default.
>
> Hmm, what about file.txt -> ._file.txt.1, ._file.txt.2, etc? If you
> don''t like the _ you could use @ or some other character.
You missed Nicolas''s point.

It does not matter which delimiter you use.  I still want my "for i in
*; do ..." to work as per now.

We want to differentiate files that are created intentionally from
those that are just versions.  If files starts showing up on their
own, a lot of my scripts will break.  Still, an FV-aware
shell/program/API can accept an environment setting that may quiesce
the version output. E.g. "export show-version=off/on".

-- 
Just me,
Wire ...

Jonathan Edwards

2006-Oct-09 02:28 UTC

head link

[zfs-discuss] A versioning FS

On Oct 8, 2006, at 21:40, Wee Yeh Tan wrote:
> On 10/7/06, Ben Gollmer <ben at jatosoft.com> wrote:
>> On Oct 6, 2006, at 6:15 PM, Nicolas Williams wrote:
>> > What I''m saying is that I''d like to be able to
keep multiple
>> > versions of
>> > my files without "echo *" or "ls" showing them
to me by default.
>>
>> Hmm, what about file.txt -> ._file.txt.1, ._file.txt.2, etc? If you
>> don''t like the _ you could use @ or some other character.
>
> You missed Nicolas''s point.
>
> It does not matter which delimiter you use.  I still want my "for i in
> *; do ..." to work as per now.
>
> We want to differentiate files that are created intentionally from
> those that are just versions.  If files starts showing up on their
> own, a lot of my scripts will break.  Still, an FV-aware
> shell/program/API can accept an environment setting that may quiesce
> the version output. E.g. "export show-version=off/on".
>
if we''re talking implementation - i think it would make more sense to
store the block version differences in the base dnode itself rather than
creating new dnode structures to handle the different versions.  You''d
then structure different tools or flags to handle the versions (copy  
them
to a new file/dnode, etc) - standard or existing tools don''t need to  
know
about the underlying versions.

.je

Nicolas Williams

2006-Oct-09 02:30 UTC

head link

[zfs-discuss] A versioning FS

On Thu, Oct 05, 2006 at 05:25:17PM -0700, David Dyer-Bennet
wrote:> No, any sane VC protocol must specifically forbid the checkin of the
> stuff I want versioning (or file copies or whatever) for.  It''s
> partial changes, probably doesn''t compile, nearly certainly
doesn''t
> work.  This level of work product *cannot* be committed to the
> repository.
> 
> [...]
> 
> One of the big problems with CVS and SVN and Microsoft SourceSafe is
> that you don''t have the benefits of version control most of the
time,
> because all commits are *public*.
I think what you''re saying is something like this: a VC repository is
one thing, but when I''m working on something not ready to put into that
repository I still want versioning in my "workspace."

That''s still VC though!

In Teamware you use SCCS for version control in your workspace, then, if
you have wx (a script built atop Teamware) you collapse the SCCS deltas
to remove all the intermediate work and ''putback'' just the end
result to
the parent repository.

In Teamware the distinction between repository and workspace isn''t :)

But you can work that way in many other VCs.  In PRCS, for example, you
can checkout a project, check it into a new repository, check in changes
as you go, then later do this again with the "trunk," merge, then
check-in to the original repository.  Or you can use one repository and
delete unsightly history.

Mercurial supports the model of development we use in ON based on
Teamware.  So you can also get version control for your intermediate
versions using Mercurial and lose the unsightly history when you''re
ready to commit your changes to the gate.

It''s been a while since I''ve used ClearCase, but I''m
pretty sure there''s
something like this there as well.

And, in any case, I think any good VC supports this.  And all should,
because with file versioning a la VMS I don''t get a lot of things I
need, like comments, branches, history, merges, etc...

Nico
--

Nicolas Williams

2006-Oct-09 02:42 UTC

head link

[zfs-discuss] A versioning FS

On Mon, Oct 09, 2006 at 09:27:14AM +0800, Wee Yeh Tan
wrote:> On 10/7/06, David Dyer-Bennet <dd-b at dd-b.net> wrote:
> >I''ve never encountered branch being used that way, anywhere. 
It''s
> >used for things like developing release 2.0 while still supporting 1.5
> >and 1.6.
> >
> >However, especially with merge in svn it might be feasible to use a
> >branch that way.  What''s the operation to update the branch
from the
> >trunk in that scenario?
> 
> You "merge" the changes from the main trunk.
I think David meant something else.  History of intermediate changes is
often useless, particularly if some of those changes don''t build.

In ON development we''ve used Teamware for years, and for years
we''ve had
a policy that intermediate deltas must be collapsed.  We have a script,
''wx'', that can do that trivially, and good thing too, because
collapsing
deltas without it is a pain.

(I.e., in Teamware terms, if you bringover version 1.7 of some file,
check-in 1.8, then 1.9, then putback to the parent workspace you''ll be
creating versions 1.8 and 1.9 in the parent when noone needs to see 1.8,
so what you want to do is collapse those two deltas, which then become
version 1.8, and that''s what you putback.)

But this is a lame argument for FV!  Because any good VC lets you
version intermediate work without polluting the main trunk when you''re
done.

Nico
--

Nicolas Williams

2006-Oct-09 02:46 UTC

head link

[zfs-discuss] A versioning FS

On Sun, Oct 08, 2006 at 10:28:06PM -0400, Jonathan Edwards
wrote:> On Oct 8, 2006, at 21:40, Wee Yeh Tan wrote:
> >On 10/7/06, Ben Gollmer <ben at jatosoft.com> wrote:
> >>Hmm, what about file.txt -> ._file.txt.1, ._file.txt.2, etc? If
you
> >>don''t like the _ you could use @ or some other character.
> >
> >It does not matter which delimiter you use.  I still want my "for
i in
> >*; do ..." to work as per now.
.<prefix> might be acceptable, but I rubs me the wrong way because of
this:
> >We want to differentiate files that are created intentionally from
> >those that are just versions.  If files starts showing up on their
> >own, a lot of my scripts will break.  Still, an FV-aware
> >shell/program/API can accept an environment setting that may quiesce
> >the version output. E.g. "export show-version=off/on".
Exactly.
> if we''re talking implementation - i think it would make more sense
to
> store the block version differences in the base dnode itself rather than
> creating new dnode structures to handle the different versions. 
You''d
> then structure different tools or flags to handle the versions (copy  
> them
> to a new file/dnode, etc) - standard or existing tools don''t need
to
> know
> about the underlying versions.
You''re arguing for treating FV as extended/named attributes :)

I think that''d be the right thing to do, since we have tools that are
aware of those already.  Of course, we''re talking about somewhat
magical
attributes, but I think that''s fine (though, IIRC, NFSv4 [RFC3530] has
some strange verbiage limiting attributes to "applications").

Nico
--

Wee Yeh Tan

2006-Oct-09 02:52 UTC

head link

[zfs-discuss] A versioning FS

On 10/9/06, Jonathan Edwards <Jonathan.Edwards at sun.com>
wrote:> > We want to differentiate files that are created intentionally from
> > those that are just versions.  If files starts showing up on their
> > own, a lot of my scripts will break.  Still, an FV-aware
> > shell/program/API can accept an environment setting that may quiesce
> > the version output. E.g. "export show-version=off/on".
>
> if we''re talking implementation - i think it would make more sense
to
> store the block version differences in the base dnode itself rather than
> creating new dnode structures to handle the different versions. 
You''d
> then structure different tools or flags to handle the versions (copy
> them
> to a new file/dnode, etc) - standard or existing tools don''t need
to
> know
> about the underlying versions.
The beauty of extending the dnode is that it will continue to behave
nicely through renames or multiple hardlinks.  However, handling
Erik''s concerns about recovering deleted files will require a bit more
work (mainly concerns about how a user will recover his file(s)).
There may also be performance considerations when if mass version
purging happens often.


-- 
Just me,
Wire ...

Jonathan Edwards

2006-Oct-09 03:16 UTC

head link

[zfs-discuss] A versioning FS

On Oct 8, 2006, at 22:46, Nicolas Williams wrote:
> On Sun, Oct 08, 2006 at 10:28:06PM -0400, Jonathan Edwards wrote:
>> On Oct 8, 2006, at 21:40, Wee Yeh Tan wrote:
>>> On 10/7/06, Ben Gollmer <ben at jatosoft.com> wrote:
>>>> Hmm, what about file.txt -> ._file.txt.1, ._file.txt.2, etc?
If you
>>>> don''t like the _ you could use @ or some other
character.
>>>
>>> It does not matter which delimiter you use.  I still want my
"for
>>> i in
>>> *; do ..." to work as per now.
>
> .<prefix> might be acceptable, but I rubs me the wrong way because of
> this:
>
>>> We want to differentiate files that are created intentionally from
>>> those that are just versions.  If files starts showing up on their
>>> own, a lot of my scripts will break.  Still, an FV-aware
>>> shell/program/API can accept an environment setting that may
quiesce
>>> the version output. E.g. "export show-version=off/on".
>
> Exactly.
>
>> if we''re talking implementation - i think it would make more
sense to
>> store the block version differences in the base dnode itself  
>> rather than
>> creating new dnode structures to handle the different versions.   
>> You''d
>> then structure different tools or flags to handle the versions (copy
>> them
>> to a new file/dnode, etc) - standard or existing tools don''t
need to
>> know
>> about the underlying versions.
>
> You''re arguing for treating FV as extended/named attributes :)
kind of - but one of the problems with EAs is the increase/bloat in  
the inode/dnode structures and corresponding incompatibilities with  
other applications or tools.  Another approach might be to put it all  
into the block storage rather than trying to stuff it into the  
metadata on top.  If we look at the zfs on-disk structure instead and  
simply extend the existing block pointer mappings to handle the diffs  
along with a header block to handle the version numbers - this might  
be an easier way out rather than trying to redefine or extend the  
dnode structure.   Of course you''d still need a single attribute to  
flag reading the version block header and corresponding diff blocks,  
but this could go anywhere - even a magic acl perhaps .. i would  
argue that the overall goal should be aimed toward the reduction of  
complexity in the metadata nodes rather than attempting to extend  
them and increase the seek/parse time.

.je

Nicolas Williams

2006-Oct-09 03:54 UTC

head link

[zfs-discuss] A versioning FS

On Sun, Oct 08, 2006 at 11:16:21PM -0400, Jonathan Edwards
wrote:> On Oct 8, 2006, at 22:46, Nicolas Williams wrote:
> >You''re arguing for treating FV as extended/named attributes :)
> 
> kind of - but one of the problems with EAs is the increase/bloat in  
> the inode/dnode structures and corresponding incompatibilities with  
> other applications or tools.
This in a thread where folks [understandably] claim that storage is
cheap and abundant.  And I agree that it is.

Plus, I think you may be jumping to conclusions about the bloat of
extended attributes:
>                               Another approach might be to put it all  
> into the block storage rather than trying to stuff it into the  
> metadata on top.  If we look at the zfs on-disk structure instead and  
> simply extend the existing block pointer mappings to handle the diffs  
> along with a header block to handle the version numbers - this might  
> be an easier way out rather than trying to redefine or extend the  
> dnode structure.   Of course you''d still need a single attribute
to
> flag reading the version block header and corresponding diff blocks,  
> but this could go anywhere - even a magic acl perhaps .. i would  
> argue that the overall goal should be aimed toward the reduction of  
> complexity in the metadata nodes rather than attempting to extend  
> them and increase the seek/parse time.
Wait a minute -- the extended attribute idea is about *interfaces*, not
internal implementation.  I certainly did not argue that a file version
should be copied into an EA.

Let''s keep interface and implementation details separate.  Most of this
thread has been about interfaces precisely because that''s what users
will interact with; users won''t care one bit about how it''s
all
implemented under the hood.

Nico
--

Nicolas Williams

2006-Oct-09 04:14 UTC

head link

[zfs-discuss] Re: Snapshots of an active file

On Sun, Oct 08, 2006 at 03:38:54PM -0700, Erik Trimble
wrote:> Joseph Mocker wrote:
> >Which brings me back to the point of file versioning. If an 
> >implementation were based on something like when a file is open()ed 
> >with write bits set. There would be no potential for broken files like 
> >this.
> >
> >Also, it would seem that your statement about snapshots as being
"what
> >is there at time X" in a nutshell, describes why snapshots are 
> >different than file versioning.  File versioning is not temporal in 
> >the same way.
> >
> That is correct. File Versioning is primarily User-Driven (that is, 
> executed and completed at the End-User''s command), if you
implement it
> using open() and close() as the drivers. (which, seems to be the 
> consensus, is the sane way to do FV).  So, in theory, FV should never 
> result in any file Inconsistency or Corruption.  Snapshots are 
> essentially System-driven, and as such, about a Point-in-Time for the 
> System, not a Point-in-State of an App, which FV centers on.
I don''t agree entirely.  For many apps heuristic FV boundaries will do.
There are apps for which it won''t.

And we''ve not talked about files unlinked on rename.  Should the new
file''s FV history replace the old one''s?  Should the histories
be
merged?

I think heuristic FV has its place, but it won''t do in general.
> Snapshots and Backup definitely can result in Inconsistency, as the 
> don''t tend to communicate with the app holding a file open.
Backups have
> mitigated this problem with certain apps which tend to hold files open 
> for extended time (primarily DBs), by allowing the Backup program to 
> talk to the app, and have the app write a consistent state to disk 
> before the Backup program run.
Sortof.  If you can quiesce the application then you can snapshot the FS
safely.  Or if the application has any sort of recovery (journalling +
rollback, say), then you may be able to snapshot safely at any time.

Of course, if you have large filesystems dedicated to multiple apps
(e.g., home directories), then you typically can''t quiesce all of those
apps.
> Snapshots are indeed a different beast than FV, in both subtle and 
> not-so-subtle ways.  They fit much more into the class of Backup.  
> Honestly, I''ve thought that through this FV thread, we should
never
> reference snapshots for functionality, as they really aren''t
comparable.
> Apples to Oranges and all.
More like giant oranges (snapshots) to mini-mandarins (FV).

I''m not saying that FV is a bad idea -- IMO it''s a good idea. 
I''m
concerned about the interfaces.  VMS-style in-your-face FV seems like a
bad idea to me, and only heuristic-driven FV does too.  And then
there''s
semantics to iron out (e.g., unlinks on rename).

Things to work out:

 - APIs for creating FV		(magic EAs provide an easy answer)
 - APIs for accessing FV	(EAs provide an easy answer)
 - UIs for acessing FV		(EAs provide an easy answer)
 - heuristics for automatic FV	(open for write, atomic appends,
				 fsyncs, unlinks, ...)

 - UIs for controlling automatic FV	(ideas?  FS properties come to
					 mind; EAs; LD_PRELOAD has been
					 mentioned)

 - FV semantics in the face of POSIX

Nico
--

Nicolas Williams

2006-Oct-09 04:20 UTC

head link

[zfs-discuss] A versioning FS

On Fri, Oct 06, 2006 at 05:11:54PM -0700, David Dyer-Bennet
wrote:> I don''t recall that on TOPS-20 it was possible to not version. 
What
> you could do is set your logout.cmd file to purge your space down to
> one copy when you logged out.
I never used TOPS-20.  I did use VMS.  As I recall it didn''t have
anything like hard links (ok, let''s not pick nits here: it did have
links, but not link counts, and so everyone avoided links; orphaned
files were a pain).  No hard links simplifies a lot of things.

OTOH, we''re stuck with hard links.

My point?  There''s not necessarily an obvious, workable mapping of FV
in
non-POSIX-like OSes to POSIX ones.

Nico
--

Nicolas Williams

2006-Oct-09 04:25 UTC

head link

[zfs-discuss] A versioning FS

On Fri, Oct 06, 2006 at 06:33:14PM -0700, Erik Trimble
wrote:> But, because of the explosion in the number of files, you CAN''T 
> automatically show all versions. Users will NEVER accept this. The only 
> clean way to do this is to show file versions only upon request. Not by 
> default.
Besides, what good does do to a user to have FVs visible on every
directory listing all the time?  None, except maybe to give them the
wams and fuzzies.

The user wants those FVs only when he/she screws up :)

OTOH, I know at least one user who would find his directory listings
exploding if FV were on by default: me :)

And don''t tell me that I could turn it off, or that it''d be
off by
default, as I''m sure our IT department would probably turn it on for
all
users by default and would make it difficult for me to get it turned
off.

Nico
--

Nicolas Williams

2006-Oct-09 04:28 UTC

head link

[zfs-discuss] A versioning FS

On Fri, Oct 06, 2006 at 07:37:47PM -0600, Chad Leigh -- Shire.Net LLC
wrote:> On Oct 6, 2006, at 7:33 PM, Erik Trimble wrote:
> >This is what Nico and I are talking about:  if you turn on file  
> >versioning automatically (even for just a directory, and not a  
> >whole filesystem), the number of files being created explodes  
> >geometrically.
> 
> But it doesn''t.  Unless you are editing geometrically more files.
Perhaps my filing habits aren''t very good, as I have many files that
I''ve edited over the years in very few directories.  Why punish me?

(Also, I believe in the search better, search more, file/sort less model
that Gmail and friends promote.  Filing is a pain.  Searching should be
easy and fast.  Until we get to where searching is always simpler/faster
than scrolling through directory listings I simply could not accept
in-your-face FV.)

Nico
--

Joseph Mocker

2006-Oct-09 05:10 UTC

head link

[zfs-discuss] A versioning FS

Nicolas Williams wrote:> On Thu, Oct 05, 2006 at 05:25:17PM -0700, David Dyer-Bennet wrote:
>   
>> No, any sane VC protocol must specifically forbid the checkin of the
>> stuff I want versioning (or file copies or whatever) for. 
It''s
>> partial changes, probably doesn''t compile, nearly certainly
doesn''t
>> work.  This level of work product *cannot* be committed to the
>> repository.
>>
>> [...]
>>
>> One of the big problems with CVS and SVN and Microsoft SourceSafe is
>> that you don''t have the benefits of version control most of
the time,
>> because all commits are *public*.
>>     
>
> I think what you''re saying is something like this: a VC repository
is
> one thing, but when I''m working on something not ready to put into
that
> repository I still want versioning in my "workspace."
>
> That''s still VC though!
>   This is just one class of problem that I think VC might be useful for. 
We could go on, specific case by case coming up with best practices for 
applications, but it seems to me that FV is trying to solve a general 
problem in general way. Whether that is a good idea or bad idea I don''t
know.

However would it be great if I could somehow easily FV  a file I am 
working on with some arbitrary (closed) application I am forced to use 
without the application really knowing about it, and with little or no 
actions I have to take to do so?

przemolicc at poczta.fm

2006-Oct-09 07:32 UTC

head link

[zfs-discuss] A versioning FS

On Fri, Oct 06, 2006 at 11:57:36AM -0700, Matthew Ahrens
wrote:> przemolicc at poczta.fm wrote:
> >On Fri, Oct 06, 2006 at 01:14:23AM -0600, Chad Leigh -- Shire.Net LLC 
> >wrote:
> >>But I would dearly like to have a versioning capability.
> >
> >Me too.
> >Example (real life scenario): there is a samba server for about 200
> >concurrent connected users. They keep mainly doc/xls files on the
> >server.  From time to time they (somehow) currupt their files (they
> >share the files so it is possible) so they are recovered from backup.
> >Having versioning they could be said that if their main file is
> >corrupted they can open previous version and keep working.
> >ZFS snapshots is not solution in this case because we would have to
> >create snapshots for 400 filesystems (yes, each user has its filesystem
> >and I said that there are 200 concurrent connections but there much
more
> >accounts on the server) each hour or so.
> 
> I completely disagree.  In this scenario (and almost all others), use of 
> regular snapshots will solve the problem.  ''zfs snapshot
-r'' is
> extremely fast, and I''m working on some new features that will
make
> using snapshots for this even easier and better-performing.
> 
> If you disagree, please tell us *why* you think snapshots don''t
solve
> the problem.
Matt,

think of night when some (maybe 5 %) people still work. Having snapshot
I would still have to create snapshots for 400 filesystems each hour because I
don''t know which of them are working. And what about weekend ? Still
400 snaphosts each hour ? And ''zfs list'' will list me
400*24*2=19200 lines ?
And how about organizations which has thousends people and keep their
files on one server ? Or ISP/free e-maila account providers who have millions ?

Imagine just ordinary people who use ZFS in their homes and forgot
creating snapshots ? Or they turn their computer on once and then don''t
turn it off: they work daily (and create snapshot an hour) and don''t
turn it off in the evening but leave it working and downloading some
films and musics. Still one snapshot an hour ? How many snapshot''s a
day, a week a month ? Thousands ? And having ZFS which is _so_easy_ to use
does managing so many snapshots is ZFS-like feature ? (ZFS-like extremely easy).

The way ZFS is working right now is that it cares about disks
(checksumming), redundancy (raid*) and performance. Having versioning
would let ZFS care about people mistakes. And people do mistakes.
Yes, Matt, you are right that snapshots are a feature which might be used
here but it is not the most convenient in such scenarios. Snapshots are
probably much more useful then versioning in "predictable" scenarios:
backup at night,
software development (commit new version) etc.  In highly unpredictable
environment (many users working in _diferent_ hours in different part ot
the world) you would have to create many thousands of snapshots. To deal
with them might be painfull.

Matt, I agree with you that having snapshots *solve* the problem with
400 filesystems because in SVM/UFS environemnt I _wouldn''t_ have such
solution. But I feel that versioning would be much more convenient here.
Imagine that you are the admin of the server and ZFS has versioning: having a
choice
what would you choose in this case ?

przemol

Matthew Ahrens

2006-Oct-09 07:57 UTC

head link

[zfs-discuss] A versioning FS

przemolicc at poczta.fm wrote:>> I completely disagree.  In this scenario (and almost all others), use
of
>> regular snapshots will solve the problem.  ''zfs snapshot
-r'' is
>> extremely fast, and I''m working on some new features that will
make
>> using snapshots for this even easier and better-performing.
>>
>> If you disagree, please tell us *why* you think snapshots
don''t solve
>> the problem.
> 
> think of night when some (maybe 5 %) people still work. Having snapshot
> I would still have to create snapshots for 400 filesystems each hour
because I
> don''t know which of them are working. And what about weekend ?
Still
> 400 snaphosts each hour ? And ''zfs list'' will list me
400*24*2=19200 lines ?
> And how about organizations which has thousends people and keep their
> files on one server ? Or ISP/free e-maila account providers who have
millions ?
Yes, this is unfortunate.  I have some forthcoming changes that will 
allow you to take periodic snapshots, but only when changes are 
occurring.  This will greatly decrease the "snapshot explosion" you 
point out.
> Matt, I agree with you that having snapshots *solve* the problem with
> 400 filesystems because in SVM/UFS environemnt I _wouldn''t_ have
such
> solution. But I feel that versioning would be much more convenient here.
> Imagine that you are the admin of the server and ZFS has versioning: having
a choice
> what would you choose in this case ?
I think that my preference would depends a lot on the details of the 
file versioning.  Certainly, if there is some implementation of file 
versioning which allows me to find the old data I''m looking for more 
easily than snapshots, and it does not impose an undue performance 
burden[*] on the system, then I would choose that.  However, as we''re 
discovering on this thread, discovering such a scheme is nontrivial :-)

--matt

[*] This includes disk space usage.  For example, how would the space 
used by file versions be accounted/expressed?  How would file versioning 
interact with snapshots?  Including old file versions in snapshots might 
be contrary to the user''s expectations, and wasteful of space.  But any
other behavior may be prohibitively complicated to implement.

przemolicc at poczta.fm

2006-Oct-09 08:51 UTC

head link

[zfs-discuss] A versioning FS

On Fri, Oct 06, 2006 at 02:08:34PM -0700, Erik Trimble
wrote:> Also, "save-early-save-often"  results in a version explosion, as
does
> auto-save in the app.  While this may indeed mean that you have all of 
> your changes around, figuring out which version has them can be 
> massively time-consuming.  Let''s say you have auto-save set for 5 
> minutes (very common in MS Word). That gives you 12 versions per hour.  
> If you suddenly decide you want to back up a couple of hours, that 
> leaves you with looking at a whole bunch of files, trying to figure out 
> which one you want.  E.g. I want a file from about 3 hours ago. Do I 
> want the one from 2:45, 2:50, 2:55, 3:00, 3:05, 3:10, or 3:15 hours 
> ago?  And, what if I''ve mis-remembered, and it really was closer
to 4
> hours ago?  Yes, the data is eventually there. However, wouldn''t a
> 1-hour snapshot capability have saved you an enormous amount of time, by 
> being able to simplify your search (and, yes, you won''t have
_exactly_
> the version you want, but odds are you will have something close, and 
> you can put all the time you would have spent searching the FV tree into 
> restarting work from the snapshot-ed version).
Erik,

versioning could be managed by sort of versioning policy managed by
users. E.g. if a file, which is going to be saved right now (auto-saving),
has a previous version saved within last 30 minuts, don''t create
another
"previous" version. 
10:00 open	file f.xls
10:10 				(...working...)
10:20 		file.xls;1	(...auto save ...)
10:30				(...working...)
10:40				(...auto save ...) -	don''t create another
							version because within
							last 30 minuts there is
							another, previous version

Another policy might be based on number of previous version: e.g. if there
are more then 10, purge the older.

> [...]
> 
> 
> To me, FV is/was very useful in TOPS-20 and VMS, where you were looking 
> at a system DESIGNED with the idea in mind, already have a user base 
> trained to use and expect it, and virtually all usage was local (i.e. no 
> network filesharing). None of this is true in the UNIX/POSIX world.
Versioning could be turned off per filesystem. And also could be
inherited from a parent - exactly like current compression.

przemol

Joerg Schilling

2006-Oct-09 10:38 UTC

head link

[zfs-discuss] A versioning FS

Erik Trimble <Erik.Trimble at Sun.COM> wrote:
> > The only idea I get thast matches this criteria is to have the
versions
> > in the extended attribute name space.
> >
> > J?rg
> >
> >   
> Realistically speaking, that''s my conclusion, if we want a nice
clean,
> well-designed solution. You need to hide the versioning info in the 
> meta-tags, and create a whole new API for accessing/manipulating them.   
> This easily solves (1) and (2) above, but (3) is the huge problem, as 
> having a new API means you need to change the SMB/NFS protocols to allow 
> for client machines to access the new API.  With the new Windows NTFS 
There is no need to extend NFS as NFS v4 already supports extended attributes.

J?rg

-- 
 EMail:joerg at schily.isdn.cs.tu-berlin.de (home) J?rg Schilling D-13353 Berlin
       js at cs.tu-berlin.de                (uni)  
       schilling at fokus.fraunhofer.de     (work) Blog:
http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily

Joerg Schilling

2006-Oct-09 10:39 UTC

head link

[zfs-discuss] A versioning FS

Nicolas Williams <Nicolas.Williams at Sun.COM> wrote:
> On Sat, Oct 07, 2006 at 01:43:29PM +0200, Joerg Schilling wrote:
> > The only idea I get thast matches this criteria is to have the
versions
> > in the extended attribute name space.
>
> Indeed.  All that''s needed then, CLI UI-wise, beyond what we have
now is
> a way to rename versions extended attributes to new file,s or at least
> copy them (we have the latter).  And it nicely hides versions.  And it
> nicely provides an API for creating them on demand ("magic"
extended
> attributes), and remote access.
>
The infrastructure is there - local or remote via NFSv4 - the problem
is that the extended attribute name space lacks definitions for usage.

J?rg

-- 
 EMail:joerg at schily.isdn.cs.tu-berlin.de (home) J?rg Schilling D-13353 Berlin
       js at cs.tu-berlin.de                (uni)  
       schilling at fokus.fraunhofer.de     (work) Blog:
http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily

Joerg Schilling

2006-Oct-09 10:44 UTC

head link

[zfs-discuss] A versioning FS

Nicolas Williams <Nicolas.Williams at Sun.COM> wrote:
> You''re arguing for treating FV as extended/named attributes :)
>
> I think that''d be the right thing to do, since we have tools that
are
> aware of those already.  Of course, we''re talking about somewhat
magical
> attributes, but I think that''s fine (though, IIRC, NFSv4 [RFC3530]
has
> some strange verbiage limiting attributes to "applications").
I thought NFSv4 supports extended attributes. What "limiting" are you 
aware of?


J?rg

-- 
 EMail:joerg at schily.isdn.cs.tu-berlin.de (home) J?rg Schilling D-13353 Berlin
       js at cs.tu-berlin.de                (uni)  
       schilling at fokus.fraunhofer.de     (work) Blog:
http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily

Erik Trimble

2006-Oct-09 11:22 UTC

head link

[zfs-discuss] A versioning FS

Joseph Mocker wrote:> However would it be great if I could somehow easily FV  a file I am 
> working on with some arbitrary (closed) application I am forced to use 
> without the application really knowing about it, and with little or no 
> actions I have to take to do so?
>To paraphrase an old wive''s tale:

"That ain''t gonna happen in my lifetime."

I think that this discussion thread has determined that you DON''T want 
to make file versioning visible to un-modified applications.

That said, given that it looks like a new FV API is the current favorite 
implementation way, I see no reason why you can''t open your favorite 
app, edit FOO, and have ZFS do versioning on it behind the scenes.  You 
just won''t be able to see the versions from inside your App.
You''d need
a FV-aware app (whether a GUI filebrower or cmdline util) to access the 
old versions, and potentially copy an older version to a new filename, 
allowing you to edit it in your non-FV-aware App.

-Erik

Nicolas Williams

2006-Oct-09 15:03 UTC

head link

[zfs-discuss] A versioning FS

On Mon, Oct 09, 2006 at 12:44:34PM +0200, Joerg Schilling
wrote:> Nicolas Williams <Nicolas.Williams at Sun.COM> wrote:
> 
> > You''re arguing for treating FV as extended/named attributes
:)
> >
> > I think that''d be the right thing to do, since we have tools
that are
> > aware of those already.  Of course, we''re talking about
somewhat magical
> > attributes, but I think that''s fine (though, IIRC, NFSv4
[RFC3530] has
> > some strange verbiage limiting attributes to
"applications").
> 
> I thought NFSv4 supports extended attributes. What "limiting" are
you
> aware of?
It does.  I meant this on pg. 12:

                                                 [...]  Named attributes
   are meant to be used by client applications as a method to associate
   application specific data with a regular file or directory.

and this on pg. 36:

   Named attributes are intended for data needed by applications rather
   than by an NFS client implementation.  NFS implementors are strongly
   encouraged to define their new attributes as recommended attributes
   by bringing them to the IETF standards-track process.

and this on pg. 232:

17.1.  Named Attribute Definition

   The NFS version 4 protocol provides for the association of named
   attributes to files.  The name space identifiers for these attributes
   are defined as string names.  The protocol does not define the
   specific assignment of the name space for these file attributes.
   Even though the name space is not specifically controlled to prevent
   collisions, an IANA registry has been created for the registration of
   NFS version 4 named attributes.  Registration will be achieved
   through the publication of an Informational RFC and will require not
   only the name of the attribute but the syntax and semantics of the
   named attribute contents; the intent is to promote interoperability
   where common interests exist.  While application developers are
   allowed to define and use attributes as needed, they are encouraged
   to register the attributes with IANA.

Nico
--

Jonathan Edwards

2006-Oct-09 15:16 UTC

head link

[zfs-discuss] A versioning FS

On Oct 8, 2006, at 23:54, Nicolas Williams wrote:
> On Sun, Oct 08, 2006 at 11:16:21PM -0400, Jonathan Edwards wrote:
>> On Oct 8, 2006, at 22:46, Nicolas Williams wrote:
>>> You''re arguing for treating FV as extended/named
attributes :)
>>
>> kind of - but one of the problems with EAs is the increase/bloat in
>> the inode/dnode structures and corresponding incompatibilities with
>> other applications or tools.
>
> This in a thread where folks [understandably] claim that storage is
> cheap and abundant.  And I agree that it is.
>
> Plus, I think you may be jumping to conclusions about the bloat of
> extended attributes:
>
>>                               Another approach might be to put it all
>> into the block storage rather than trying to stuff it into the
>> metadata on top.  If we look at the zfs on-disk structure instead and
>> simply extend the existing block pointer mappings to handle the diffs
>> along with a header block to handle the version numbers - this might
>> be an easier way out rather than trying to redefine or extend the
>> dnode structure.   Of course you''d still need a single
attribute to
>> flag reading the version block header and corresponding diff blocks,
>> but this could go anywhere - even a magic acl perhaps .. i would
>> argue that the overall goal should be aimed toward the reduction of
>> complexity in the metadata nodes rather than attempting to extend
>> them and increase the seek/parse time.
>
> Wait a minute -- the extended attribute idea is about *interfaces*,  
> not
> internal implementation.  I certainly did not argue that a file  
> version
> should be copied into an EA.
true, but I just find that the EA discussion is just as loaded as the FV
discussion that too often focuses on improvements in the metadata
space rather than the block data space.  I''m not talking about the file
version data .. rather the bplist for the file version data and possibly
causing this to live in the block data space instead of the dnode
DMU.  This way the FV will be completely accessible within the
filesystem block data structure instead of being abstracted back out
of the dnode DMU.  I would hold that the version data space
consumption should also be readily apparent on the filesystem level
and that versioned access should not impede the regular file
lookup or attribute caching.  It''s a slight deviation from the typical
EA approach, but an important distinction to make to keep the
metadata structures relatively lean.
> Let''s keep interface and implementation details separate.  Most of
> this
> thread has been about interfaces precisely because that''s what
users
> will interact with; users won''t care one bit about how
it''s all
> implemented under the hood.
I''m not so sure you can separate the two without creating a hack.  I
would also argue that users (particularly the ones creating the
interfaces) will care about the implementation details since those
are the real underlying issues they''ll be wrestling with.

.je

Nicolas Williams

2006-Oct-09 15:22 UTC

head link

[zfs-discuss] A versioning FS

On Mon, Oct 09, 2006 at 11:16:41AM -0400, Jonathan Edwards
wrote:> On Oct 8, 2006, at 23:54, Nicolas Williams wrote:
> >Let''s keep interface and implementation details separate. 
Most of
> >this
> >thread has been about interfaces precisely because that''s what
users
> >will interact with; users won''t care one bit about how
it''s all
> >implemented under the hood.
> 
> I''m not so sure you can separate the two without creating a hack. 
I
> would also argue that users (particularly the ones creating the
> interfaces) will care about the implementation details since those
> are the real underlying issues they''ll be wrestling with.
I''m sure that we can.  And I''m sure that most users
won''t care one bit
how FV is implemented.

Nico
--

David Dyer-Bennet

2006-Oct-09 15:51 UTC

head link

[zfs-discuss] A versioning FS

On 10/6/06, Erik Trimble <Erik.Trimble at sun.com>
wrote:> David Dyer-Bennet wrote:
> > On 10/6/06, Nicolas Williams <Nicolas.Williams at sun.com>
wrote:
> >
> >> > >Maybe Erik would find it confusing.  I know I would find
it
> >> > >_annoying_.
> >> >
> >> > Then leave it set to 1 version
> >>
> >> Per-directory?  Per-filesystem?
> >
> > Whatever.  What''s the actual issue here?
> >
> > I don''t recall that on TOPS-20 it was possible to not
version.  What
> > you could do is set your logout.cmd file to purge your space down to
> > one copy when you logged out.
> But see, that assumes you have a logout-type functionality to use. Which
> indeed is possible for command-line usage, but then only in a very
> limited way.   During a typical session, I access almost 20 NFS-mounted
> directories. And anyone using autofs/automount trees gets even more.
> You''re saying that my logout script has to know about all of them
to
> keep things clean?  That''s unrealistic.  And that still
doesn''t solve
> the problem of people who use SAMBA or NFS from machines which
don''t
> have an interactive shell logout system (i.e. Windows).
Seems entirely realistic to me that your logout script would know
about the things you routinely use.  People who don''t log into any
system are more of a problem, though.  Various things come to mind,
like having a default number of files (so it doesn''t expand without
limits), and maybe a regular cron job; but I''ve never worked in an
environment doing versioning for non-login users over the network, so
they''re all theory, no idea how they''d work in practice.
>
> > This worked fine for the users I knew; even on a system that
didn''t
> > have as much as a gigabyte of disk storage total to support a few
> > dozen software engineers.
> >
> The problem is we are comparing apples to oranges in user bases here.
> TOPS-20 systems had a couple of dozen users (or, at most, a few
> hundred).  VMS only slightly more.  UNIX/POSIX systems have 10s of
> thousands.  Plus, the number of files being created under typical modern
> systems is at least two (and probably three or four) orders of magnitude
> greater.  I''ve got 100,000 files under /usr in Solaris, and almost
1,000
> under my home directory.  And I don''t have anything significant in
my
> /home (no source code, no build/test trees, just misc business stuff).
> What is managable with a few files quickly becomes unwieldy with more
> than a few dozen.
I have to ask again -- is this theory?  Or have you actually worked on
a versioning filesystem?  And specifically on TOPS-20?  (I remember,
vaguely, that people found VMS versioning MUCH less comfortable to
work with than TOPS-20, and I don''t know at this distance if that was
just because it was different, or because of subtle UI differences).

I don''t think the number of files under /usr is relevant; how often do
you edit them by hand?  I''d expect an installation procedure to clean
up old versions when it was done installing new software; but if not a
simple purge would settle the matter.

I don''t recall my directories having much fewer files then than now.
I have more *directories* now, but the number of files in a directory
is set by human issues and by development process issues, not by disk
space available.
> This is what Nico and I are talking about:  if you turn on file
> versioning automatically (even for just a directory, and not a whole
> filesystem), the number of files being created explodes geometrically.
I don''t see it; new versions are created *when you do something* to a
file; not from the file just sitting there.  And the number of files I
poke in a day, again, isn''t controlled much by the disk space
available, it''s controlled by *my time*, and so has stayed more
constant over the years.
> >> > The above should be simple to do however -- a program does an
open of
> >> > a file name "foo.bar".  ZFS / the file system
routine would use the
> >> > most recent version by default if no version info is given.
> >>
> >> How can version information be given without changing the APIs or
> >> putting the version number/string into the file name?
> >
> > The version number is part of the file name in all the examples I know
> > about.  I''d find it useless without that; it has to be a real
part of
> > the filesystem, usable by everybody, not a special addon accessible
> > only with one or two dedicated applications.
> >
> >> Putting the version number/string into the file name is hard for
me to
> >> accept.  It''s what would lead to polluting my
directories.
> >
> > Set your ls default to not show versions.  Isn''t the problem
then
> > solved?  Maybe add that option to the GUI filesystem explorer as well.
> >
> But this requires modifying all the relevant apps, which is the same
> amount of work as modifying them to use a new FV API.  It''s not
> transparent to the end-user.
I think the relevant apps are very different in the two cases.  File
listing tools are much rarer than file using tools, and in my case you
only need to modify the file listing tools.  In your case, you have to
modify every single file using tool.
> > In practice, it never was a problem that I noticed, or that other
> > people noticed.  And remember that this was on slower systems with
> > smaller screens and often rather slower screen update.
> >
> > Do you not like the idea based on theory, or did you actually use
> > TOPS-20 for a while and find the versioning troublesome?
> >
> Putting the file version number as part of the file name breaks things.
> Apps unaware of the special significance of this format will tend to
> write similar names, which can screw everything royally.
>
> Example:
>
> Say we use <file>;<version>
>
> In emacs, I edit FOO:2
>
> it will write out a temp file "FOO:2~".  So, how does the FS deal
with
> this the next time they need to create a new version?
Whatever.  None of the choices are a disaster.   None of them "break"
anything.  I essentially never have to look at these, any version of
them, so it doesn''t matter very much what their names are.

Possibly some clever definitions of how things are handled could make
the results cleaner, and that''s worth looking at, but the worst
results I can imagine from this scenario are unimportant, they don''t
hurt anything.
> The problem lies in that under VMS, the '';'' was a special
character, and
> unusable in normal naming. I suspect a similar situation exists under
> TOPS-20.  No such luck in a POSIX filesystem - all printable (and many
> unprintable) characters are valid for use in filenames. So you
_CAN''T_
> use them to deliniate File Versioning, without risking blowing the
> entire scheme when some random app decides to either use your FV marker
> for its own needs, or something similar to the emacs case above.
This is theory again.  In practice, there aren''t such schemes in use
anywhere I can find.  If there are, yes, some file-versioning schemes
would break them, and those apps would have to be updated.

A theoretically clean approach is desirable, but an approach that
actually works is more important.  An approach that requires programs
to be updated before they can use file versioning doesn''t, by my
standards, "work"; I wouldn''t be able to use it with the
files and
applications it''s valuable to me for any time soon.

When you talk about a new API for versioning -- how do you envision
information being conveyed from the command lines of programs to this
new API?  Isn''t it likely that it would end up becoming a part of file
name syntax, and changing the rules about allowable characters in
filenames?  And in that case, you can make the whole change in the
"open" and "link" calls, and get the same end effect.
> >> > one UI is the command line shell
> >>
> >> Indeed!  And command-line tools, like ls(1), find(1), etc...
> >>
> >> What I''m saying is that I''d like to be able to
keep multiple versions of
> >> my files without "echo *" or "ls" showing them
to me by default.
> >
> > And I find that completely unacceptable; useless.  The whole point of
> > putting versioning in the filesystem is that that makes it accessible
> > to all programs.
> >
> But, because of the explosion in the number of files, you CAN''T
> automatically show all versions. Users will NEVER accept this. The only
> clean way to do this is to show file versions only upon request. Not by
> default.
Is this theory, or do you have some experience to support it?  You say
"can''t"; I''m not at all worried about it, myself. 
I''ve worked in
these environments, and liked it very much.  I''ve watched new people
get introduced to them.  People like this when they see it
well-implemented.

I don''t accept your assertion that directories people edit files in
have more files in them today than they used to, in general.  I also
don''t accept the assertion that the number of extra versions scales
with the number of files in the directory -- it scales with the number
of files you re-write in the directory, which is limited more by human
working speed and time in the day, not by number of files there.
> >> > >What if an application deals in multiple files?
> >> >
> >> > so?
> >>
> >> So, file versions aren''t useful unless the application
explicitly
> >> decides tells the OS when to make them.
> >
> > File versions are created when a file is created.  In the scenario
> > where, today, an existing file would be overwritten (deleted), instead
> > the old file is kept and the new file is given the version number +1
> > of the old file.
> >
> >> Similarly with applications that keep files open but keep writing
> >> transactions in ways that the OS can''t isolate without
input from the
> >> app.  E.g., databases.  fsync(2) helps here, but lots and lots of
> >> fsync(2)s would result in no useful versioning.
> >
> > None of those are candidates for file versioning, and a darned good
> > thing, too.
>
> Honestly, as far as file versioning goes, the time to make a new version
> is when calling open() with the appropriate arguments to allow for
> append or modification. You obviously don''t want to create a new
version
> if you are only opening a file for read-only access, and changing
> version on fsync() is ludicrous, and on close() doesn''t
differentiate
> between a file which has been modified or not.
Yes, versioning is a file-create feature.
> Given this, we''re back into the problem FV is supposed to solve.  
It is
> entirely possible for an editor to keep open a file for a long time,
> periodically writing out your changes without issuing a new open().
You describe this as a problem, but *I* see it as the exact thing that
makes file versioning useful.  It DOESN''T save random magically chosen
moments; it saves exactly all the version that *you*, the user, saved
at some point of the editing session.
> Word with auto-save turned off is a prime example.   Given this,
you''ve
> only created a new version when you first load the document, and all
> your intermediary changes are lost, since it only saves the document on
> close().
You''re forgetting that the user, unless he''s stupid, will save
regularly during the editing session.
> Thus, in order to get benefits from FV, your editor must
> issue periodic close() and open() commands on the same file, as you
> edit, all without your intervention.  Exactly how many editors do this?
> I have no idea.  So, the only way to enable FV is to require the user to
> periodically push the "Save" button. Which is how much more
different
> than the current situation?
It is completely and utterly different from the current situation.  In
the current situation, when I type the "save" command *I am deleting a
previous version*.  That''s dangerous, because people don''t
think of it
as performing a destructive operation, and hence don''t give it the
care and consideration they give to an explicit "rm".  And
that''s
precisely what file versioning fixes; saving a file is no longer a
destructive operation.
-- 
David Dyer-Bennet, <mailto:dd-b at dd-b.net>,
<http://www.dd-b.net/dd-b/>
RKBA: <http://www.dd-b.net/carry/>
Pics: <http://www.dd-b.net/dd-b/SnapshotAlbum/>
Dragaera/Steven Brust: <http://dragaera.info/>

David Dyer-Bennet

2006-Oct-09 16:03 UTC

head link

[zfs-discuss] A versioning FS

On 10/7/06, Erik Trimble <Erik.Trimble at sun.com>
wrote:> Chad Leigh -- Shire.Net LLC wrote:
> >>>             Plus, the number of files being created under
typical
> >>> modern systems is at least two (and probably three or four)
orders
> >>> of magnitude greater.  I''ve got 100,000 files under
/usr in Solaris,
> >>> and almost 1,000 under my home directory.
> >>
> >> wimp :-)  I count 88,148 in my main home directory.  I''ll
bet just
> >> running gnome and firefox will get you in the ballpark of 1,000
:-/
> >
> > None (well, maybe 1 or 2)  of which you edit and hence would not
> > generate versions.
> >
> > Chad
>
> Richard actually brings up a good point, which answers another question
> Chad had for me:  exactly how many files do I edit?   Which directly
> impacts the "directory pollution" problem I''ve been
talking about.
>
> There are essentially three scenarios:
>
> (a)  FV is turned on on a per-file basis
>
> (b) FV is turned on on a per-directory basis
>
> (c) FV is turned on on a per-filesystem basis
>
>
> Now, I think we can all see that you get geometic file explosion in case
> (c), as absolutely anything that writes to the filesystem gets
> versioned.  Things like Web Browser caches alone would kill you.
Web browser caches (as normally used) would *never* generate a single
additional file version.  The web browsers use a naming algorithm to
prevent overwriting the same file, and that''s the situation when a new
version is created.  They delete the files they decide they don''t need
directly, rather than by overwriting the same name.

Your use of "writes to the filesystem" suggests to me you''re
thinking
of a different implementation of versioning than was in TOPS-20 and
VMS, and that (I think) most of us are discussing here.   The kind of
versioning I''m talking about works by keep old versions of a file
*when it''s overwritten by a new version*.  It''s the operation
of
creating a new file with the same name as an old file that triggers
it; in current Unix semantics the old file is deleted, but in the kind
of FV I''m talking about, the old version is *kept* and the new version
is given an incremented version number to keep the names unique.

It has nothing to do with writing to files; if you update a file in
place, a new version isn''t generated.
-- 
David Dyer-Bennet, <mailto:dd-b at dd-b.net>,
<http://www.dd-b.net/dd-b/>
RKBA: <http://www.dd-b.net/carry/>
Pics: <http://www.dd-b.net/dd-b/SnapshotAlbum/>
Dragaera/Steven Brust: <http://dragaera.info/>

Larry Becke

2006-Oct-09 18:15 UTC

head link

[zfs-discuss] Re: A versioning FS

Why not see if you can find (or write, or have written) an editor that does the
version name changes for you?

i.e. - each time you save, or each auto-save, it writes a different version of
the file, and when you exit, it asks if you''d like to retain the other
versions or not?

Sounds like it would be a LOT simpler to do, and with snapshots for everything
else, I don''t see a need for a version name changing filesystem.
 
 
This message posted from opensolaris.org

Bill Sommerfeld

2006-Oct-10 12:47 UTC

head link

[zfs-discuss] Re: A versioning FS

On Fri, 2006-10-06 at 00:07 -0700, Richard L. Hamilton
wrote:> Some people are making money on the concept, so I
> suppose there are those who perceive benefits:
> 
> http://en.wikipedia.org/wiki/Rational_ClearCase
> 
> (I dimly remember DSEE on the Apollos; ...)
I used both fairly extensively.  Much of the apollo DSEE team left HP to
write ClearCase.  Neither are versioning filesystems; instead, both are
software configuration management systems which export a limited virtual
filesystem interface.   With such systems, versioning is not transparent
but instead involves interaction with a CLI or GUI around
checkout/checkin.

Joerg Schilling

2006-Oct-11 18:24 UTC

head link

[zfs-discuss] A versioning FS

Nicolas Williams <Nicolas.Williams at Sun.COM> wrote:
> On Mon, Oct 09, 2006 at 12:44:34PM +0200, Joerg Schilling wrote:
> > Nicolas Williams <Nicolas.Williams at Sun.COM> wrote:
> > 
> > > You''re arguing for treating FV as extended/named
attributes :)
> > >
> > > I think that''d be the right thing to do, since we have
tools that are
> > > aware of those already.  Of course, we''re talking about
somewhat magical
> > > attributes, but I think that''s fine (though, IIRC, NFSv4
[RFC3530] has
> > > some strange verbiage limiting attributes to
"applications").
> > 
> > I thought NFSv4 supports extended attributes. What
"limiting" are you
> > aware of?
>
> It does.  I meant this on pg. 12:
>
>                                                  [...]  Named attributes
>    are meant to be used by client applications as a method to associate
>    application specific data with a regular file or directory.
FreeBSD and Linux implement something different also called extended attributes.
There should be a possibility to map from FreeBSD/Linux to Solaris.
> and this on pg. 36:
>
>    Named attributes are intended for data needed by applications rather
>    than by an NFS client implementation.  NFS implementors are strongly
>    encouraged to define their new attributes as recommended attributes
>    by bringing them to the IETF standards-track process.
See above... Since the extended attributes appeared on a Solaris ( 8 update???),
I was looking for a way to map simple exteneded attribute implementation as 
those on Mac OS, FreeBSD and Linux to the more general implementation on 
Solaris.

Before we start defining the first offocial functionality for this Sun feature, 
we should define a mapping for Mac OS, FreeBSD and Linux. It may make sense, to 
define a sub directory for the attribute directory for keeping old versions
of a file.

J?rg

-- 
 EMail:joerg at schily.isdn.cs.tu-berlin.de (home) J?rg Schilling D-13353 Berlin
       js at cs.tu-berlin.de                (uni)  
       schilling at fokus.fraunhofer.de     (work) Blog:
http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily

Nicolas Williams

2006-Oct-11 19:18 UTC

head link

[zfs-discuss] A versioning FS

On Wed, Oct 11, 2006 at 08:24:13PM +0200, Joerg Schilling
wrote:> Before we start defining the first offocial functionality for this Sun
feature,
> we should define a mapping for Mac OS, FreeBSD and Linux. It may make
sense, to
> define a sub directory for the attribute directory for keeping old versions
> of a file.
Definitely a sub-directory would be needed yes, and I don''t agree to
the
first part.

Joerg Schilling

2006-Oct-13 09:03 UTC

head link

[zfs-discuss] A versioning FS

Nicolas Williams <Nicolas.Williams at Sun.COM> wrote:
> On Wed, Oct 11, 2006 at 08:24:13PM +0200, Joerg Schilling wrote:
> > Before we start defining the first offocial functionality for this Sun
feature,
> > we should define a mapping for Mac OS, FreeBSD and Linux. It may make
sense, to
> > define a sub directory for the attribute directory for keeping old
versions
> > of a file.
>
> Definitely a sub-directory would be needed yes, and I don''t agree
to the
> first part.
Why not?

J?rg

-- 
 EMail:joerg at schily.isdn.cs.tu-berlin.de (home) J?rg Schilling D-13353 Berlin
       js at cs.tu-berlin.de                (uni)  
       schilling at fokus.fraunhofer.de     (work) Blog:
http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily

Nicolas Williams

2006-Oct-13 16:13 UTC

head link

[zfs-discuss] A versioning FS

On Fri, Oct 13, 2006 at 11:03:51AM +0200, Joerg Schilling
wrote:> Nicolas Williams <Nicolas.Williams at Sun.COM> wrote:
> 
> > On Wed, Oct 11, 2006 at 08:24:13PM +0200, Joerg Schilling wrote:
> > > Before we start defining the first offocial functionality for
this Sun feature,
> > > we should define a mapping for Mac OS, FreeBSD and Linux. It may
make sense, to
> > > define a sub directory for the attribute directory for keeping
old versions
> > > of a file.
> >
> > Definitely a sub-directory would be needed yes, and I don''t
agree to the
> > first part.
> 
> Why not?
Because I don''t see how creating a sub-directory of the EA namespace
for
storing FVs will step on the toes of anyone trying to map other
platforms'' notions of EA onto Solaris''.  Is this being too
optimistic?

Nico
--

zfs discuss - Oct 2006 - A versioning FS

[zfs-discuss] A versioning FS

[zfs-discuss] A versioning FS

[zfs-discuss] A versioning FS

[zfs-discuss] A versioning FS

[zfs-discuss] A versioning FS

[zfs-discuss] A versioning FS

[zfs-discuss] A versioning FS

[zfs-discuss] A versioning FS

[zfs-discuss] A versioning FS

[zfs-discuss] A versioning FS

[zfs-discuss] A versioning FS

[zfs-discuss] A versioning FS

[zfs-discuss] A versioning FS

[zfs-discuss] A versioning FS

[zfs-discuss] A versioning FS

[zfs-discuss] A versioning FS

[zfs-discuss] A versioning FS

[zfs-discuss] A versioning FS

[zfs-discuss] A versioning FS

[zfs-discuss] A versioning FS

[zfs-discuss] Re: A versioning FS

[zfs-discuss] A versioning FS

[zfs-discuss] A versioning FS

[zfs-discuss] A versioning FS

[zfs-discuss] A versioning FS

[zfs-discuss] A versioning FS

[zfs-discuss] A versioning FS

[zfs-discuss] Re: A versioning FS

[zfs-discuss] Re: A versioning FS

[zfs-discuss] A versioning FS

[zfs-discuss] A versioning FS

[zfs-discuss] A versioning FS

[zfs-discuss] A versioning FS

[zfs-discuss] A versioning FS

[zfs-discuss] A versioning FS

[zfs-discuss] A versioning FS

[zfs-discuss] A versioning FS

[zfs-discuss] A versioning FS

[zfs-discuss] A versioning FS

[zfs-discuss] A versioning FS

[zfs-discuss] A versioning FS

[zfs-discuss] A versioning FS

[zfs-discuss] A versioning FS

[zfs-discuss] A versioning FS

[zfs-discuss] A versioning FS

[zfs-discuss] A versioning FS

[zfs-discuss] A versioning FS

[zfs-discuss] A versioning FS

[zfs-discuss] A versioning FS

[zfs-discuss] A versioning FS

[zfs-discuss] A versioning FS

[zfs-discuss] A versioning FS

[zfs-discuss] A versioning FS

[zfs-discuss] A versioning FS

[zfs-discuss] A versioning FS

[zfs-discuss] A versioning FS

[zfs-discuss] Re: A versioning FS

[zfs-discuss] Re: A versioning FS

[zfs-discuss] Re: A versioning FS

[zfs-discuss] A versioning FS

[zfs-discuss] A versioning FS

[zfs-discuss] A versioning FS

[zfs-discuss] Re: A versioning FS

[zfs-discuss] (OT: SVN branches) A versioning FS

[zfs-discuss] A versioning FS

[zfs-discuss] Re: A versioning FS

Snapshots of an active file (was: Re: [zfs-discuss] A versioning FS)

[zfs-discuss] A versioning FS

[zfs-discuss] A versioning FS

[zfs-discuss] A versioning FS

[zfs-discuss] A versioning FS

[zfs-discuss] A versioning FS

[zfs-discuss] A versioning FS

[zfs-discuss] Re: A versioning FS

[zfs-discuss] Re: A versioning FS

[zfs-discuss] Re: A versioning FS

[zfs-discuss] Re: A versioning FS

[zfs-discuss] Re: Snapshots of an active file

[zfs-discuss] Re: Snapshots of an active file