Hi, checked all Wiki and documentation here on this site, and still need an answer for a conference paper I am writing: Can ZFS produce event-driven snapshots? Of course, I mean snapshots of specific files/system in the event of a change? This question has eluded me until now. Uwe This message posted from opensolaris.org
Come on! Nobody?! I read through documents for several hours, and obviously done my work. Can someone please point me to link, or just unambiguously say ''yes'' or ''no'' to my question, if ZFS could produce a snapshot of whatever type, initiated with a signal that in turn is derived from a change (edit) of a file; like inotify in Linux 2.6.13 and above. Uwe This message posted from opensolaris.org
On Feb 23, 2008, at 10:57, Uwe Dippel wrote:> Come on! Nobody?! > I read through documents for several hours, and obviously done my > work. > Can someone please point me to link, or just unambiguously say > ''yes'' or ''no'' to my question, if ZFS could produce a snapshot of > whatever type, initiated with a signal that in turn is derived from > a change (edit) of a file; like inotify in Linux 2.6.13 and above.Solaris has /dev/poll, so if you want to code something up that sets up a trigger you can. The action on that trigger can then be running the ''zfs snapshot'' command (or anything else for that matter). Solaris is UNIX(tm), where the philosophy is generally many discrete tools that can be linked together.
I''m not answering from experience, but a quick google found that solaris does not have file change notification: http://blogs.sun.com/praks/entry/file_events_notification So I''d have thought you could use that to take a ZFS snapshot. ZFS snapshots aren''t of any one particular file, they are of the whole filesystem, but they don''t consume a lot of space (so long as you don''t expect file deletions to free space ;-)), and you could create lots of sub filesystems I guess if you needed to. Would that do the job? This message posted from opensolaris.org
[i]google found that solaris does have file change notification: http://blogs.sun.com/praks/entry/file_events_notification [/i] Didn''t see that one, thanks. [i]Would that do the job?[/i] It is not supposed to do a job, thanks :), it is for a presentation at a conference I will be giving. I was wondering if Solaris offered CDP, Continuous Data Protection, like http://en.wikipedia.org/wiki/Continuous_data_protection To me, "Continuous data protection is different from traditional backup in that you don''t have to specify the point in time to which you would like to recover until you are ready to perform a restore. Traditional backups can only restore data to the point at which the backup was taken. With continuous data protection, there are no backup schedules" is a perfect match to the revolutionary design of ZFS; and I am sure the next TimeMachine of Apple will have it. File change notification as system call is not the optimum, probably. The overhead could be pretty high in comparison with the file system (ZFS) itself noticing a change, a ''write'', and on own initiative storing the block to be replaced in a ''back-in-time''-queue. TimeMachine is limited, as it allows to roll to specified moments in time only. Multiple changes in between those snapshots will be lost. A StateMachine will allow to roll to any specific state, to each state a file ever had had. Think log files of a RDBMS, without the overhead. Uwe This message posted from opensolaris.org
Uwe Dippel wrote:> [i]google found that solaris does have file change notification: http://blogs.sun.com/praks/entry/file_events_notification > [/i] > > Didn''t see that one, thanks. > > [i]Would that do the job?[/i] > > It is not supposed to do a job, thanks :), it is for a presentation at a conference I will be giving. I was wondering if Solaris offered CDP, Continuous Data Protection, like http://en.wikipedia.org/wiki/Continuous_data_protection > To me, "Continuous data protection is different from traditional backup in that you don''t have to specify the point in time to which you would like to recover until you are ready to perform a restore. Traditional backups can only restore data to the point at which the backup was taken. With continuous data protection, there are no backup schedules" is a perfect match to the revolutionary design of ZFS; and I am sure the next TimeMachine of Apple will have it. > File change notification as system call is not the optimum, probably. The overhead could be pretty high in comparison with the file system (ZFS) itself noticing a change, a ''write'', and on own initiative storing the block to be replaced in a ''back-in-time''-queue. > > TimeMachine is limited, as it allows to roll to specified moments in time only. Multiple changes in between those snapshots will be lost. > A StateMachine will allow to roll to any specific state, to each state a file ever had had. > Think log files of a RDBMS, without the overhead. > > Uwe >ZFS can''t currently provide continuous data protection. Even if you triggered snapshots with every file change, on a large or busy file system, the snapshots would not be able to keep up and you would quickly overload your system. Snapshots are light weight, but not resource free. It''s the wrong way to time stamp your data. Too global. I think this would be a very worthwhile project for the ZFS team to take on. CDP could be easily implemented by simply marking every block write once, and then a policy could be put in place to expire old blocks as a user defined parameter. This would ensure free space remains in the file system. Then AVS could be deployed for remote replication and bingo. Maybe I''m missing something but this seems like a no brainer. Train of thought here, but you could do better than most CDP offerings, in that you could not only offer to recover the file system from an instant in time, but you could selectively recover a single file from an instant in time. That would be killer. In some circles, CDP is big business. It would be a great ZFS offering. Jon
On Feb 24, 2008, at 01:49, Jonathan Loran wrote:> In some circles, CDP is big business. It would be a great ZFS > offering.ZFS doesn''t have it built-in, but AVS made be an option in some cases: http://opensolaris.org/os/project/avs/
David Magda wrote:> On Feb 24, 2008, at 01:49, Jonathan Loran wrote: > >> In some circles, CDP is big business. It would be a great ZFS offering. > > ZFS doesn''t have it built-in, but AVS made be an option in some cases: > > http://opensolaris.org/os/project/avs/Point in time copy (as AVS offers) is not the same thing as CDP. When you snapshot data as in point in time copies, you predict the future, knowing the time slice at which your data will be needed. Continuous data protection is based on the premise that you don''t have a clue ahead of time which point in time you want to recover to. Essentially, for CDP, you need to save every storage block that has ever been written, so you can put them back in place if you so desire. Anyone else on the list think it is worthwhile adding CDP to the ZFS list of capabilities? It causes space management issues, but it''s an interesting, useful idea. Jon
And would drive storage requirements through the roof!! I like it! ;) Nathan. Jonathan Loran wrote:> > David Magda wrote: >> On Feb 24, 2008, at 01:49, Jonathan Loran wrote: >> >>> In some circles, CDP is big business. It would be a great ZFS offering. >> ZFS doesn''t have it built-in, but AVS made be an option in some cases: >> >> http://opensolaris.org/os/project/avs/ > > Point in time copy (as AVS offers) is not the same thing as CDP. When > you snapshot data as in point in time copies, you predict the future, > knowing the time slice at which your data will be needed. Continuous > data protection is based on the premise that you don''t have a clue ahead > of time which point in time you want to recover to. Essentially, for > CDP, you need to save every storage block that has ever been written, so > you can put them back in place if you so desire. > > Anyone else on the list think it is worthwhile adding CDP to the ZFS > list of capabilities? It causes space management issues, but it''s an > interesting, useful idea. > > Jon > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Jonathan Loran wrote:> David Magda wrote: > >> On Feb 24, 2008, at 01:49, Jonathan Loran wrote: >> >> >>> In some circles, CDP is big business. It would be a great ZFS offering. >>> >> ZFS doesn''t have it built-in, but AVS made be an option in some cases: >> >> http://opensolaris.org/os/project/avs/ >> > > Point in time copy (as AVS offers) is not the same thing as CDP. When > you snapshot data as in point in time copies, you predict the future, > knowing the time slice at which your data will be needed. Continuous > data protection is based on the premise that you don''t have a clue ahead > of time which point in time you want to recover to. Essentially, for > CDP, you need to save every storage block that has ever been written, so > you can put them back in place if you so desire. > > Anyone else on the list think it is worthwhile adding CDP to the ZFS > list of capabilities? It causes space management issues, but it''s an > interesting, useful idea. >It might be interesting, but only for a limited set of applications. For applications which do things like mmap or write multiple files concurrently, I think that keeping consistency will be difficult. In the long run, you''ll be better off making the applications implement their own, contextual data replication. Another sort of application which won''t work well with this is one that writes an entire file for each record written. One such example is pkgadd (reason #5238 why IPS is much better than the old SVR4 packaging system). -- richard
[i]And would drive storage requirements through the roof!![/i] The interesting part is, Nathan, you''re probably wrong. First, though, some of my contacts in the enterprise gladly spent millions for third-party applications running on Microsoft to do exactly that. [But we all know that SUN is famous for almost always missing the departing train] I have no proof for what I state, though my hypothesis is just the opposite. Increasing backup frequency simply requires more storage, I think we can all agree. It is therefore my assumption, that at some moment in time, 1. cron-like jobs will consume more and more resources (undoubtedly) 2. at one moment in increasing the density of cron-like (time-line) backups, the amount of metadata is going to supersede the amount of actual changes. Of course, you are right, it depends on the applications. Though I guess that - very roughly - an hourly backup (TimeMachine) is already close to that point, at least on an average home box. (Of course, you don''t include /tmp in *any* such observations!) The disadvantages of something like TimeMachine are manyfold, worst of all is that they work on the level of files. Someone mentioned something like doing it through the application was more useful. Welcome to the 20th century. The argument is wrong, by the way, since such logs (RDBMS) are high-level, usually human-readable append operations on the file system through the application->operating system. Yes, they *are* useful, very useful; for a limited number of specific operations. But this argument fades very much within the context we discuss here. ''Backup'' is not ''archive''; but comprehensive. So doing the task of CDP on that level is suicide on system resources (cycles) as well as on storage space. Back to my hypothesis: increasing backup frequency increases data amount (my hypothesis: metadata beyond ''change data''); offering ''only'' a near-CDP experience. Once the time slots are done away with, the only data to be stored will be actual ''change data''. Meaning that the amount of metadata will be greatly reduced. And one can achieve real CDP. Prerequisite, though, is probably that the notion of ''file change'' is not done on a system-wide level, but local to the change, the ''write'' process. And, of course, it is obligatory to perform it on a block level, incremental, you-name-it. I challenge you to follow up on this matter. My interest arouse due to a presentation I''ll be giving shortly at a conference. To me, when preparing, it was obvious that ZFS can do this (CDP). Just wanted to make sure; surprise, surprise. Now I am really interested to prove my hypothesis, that - under disregard of the possible technology involved - the ''change data'', the amount of actual changes to be stored for CDP, is below the amount of data (and resources) required for near-CDP on the level of files. Contact me offline if you feel like sponsoring this research. Uwe This message posted from opensolaris.org
How do you use CDP "backups"? How do you decide at which write(2) (or dirty page write, or fsync(2), ...) to restore some file? What if the app has many files? Point-in-time? Sure, but since you can''t restore all application state (unless you''re checkpointing processes too) then how can you be sure that the data to be restored is internally consistent? And if you''ll checkpoint processes, then why not just use VMs and checkpoint those and their filesystems instead? The last option sounds much, much simpler to manage: there''s only VM name and timestamp to think about when restoring. A continuous VM checkpoint facility sounds... unlikely/expensive though. Nico --
On Tue, Feb 26, 2008 at 2:07 PM, Nicolas Williams <Nicolas.Williams at sun.com> wrote:> How do you use CDP "backups"? How do you decide at which write(2) (or > dirty page write, or fsync(2), ...) to restore some file? What if the > app has many files? Point-in-time? Sure, but since you can''t restore > all application state (unless you''re checkpointing processes too) then > how can you be sure that the data to be restored is internally > consistent? And if you''ll checkpoint processes, then why not just use > VMs and checkpoint those and their filesystems instead? The last option > sounds much, much simpler to manage: there''s only VM name and timestamp > to think about when restoring. A continuous VM checkpoint facility > sounds... unlikely/expensive though.Sorry, I don''t understand any of this. But I never pretended I did. My post was on something else: In principle we have three types of write; atomic view, please: 1. Create. The new file needs to be written only, no backup/CDP needed; identical to any conventional system. 2. Edit/Modify. Here we need to store some incremental/differential file content. rsync-like, that is. 3. Remove. Also this is similar to the conventional system, except that the files need to be retired and the blocks *not* be marked as ''available''. Changes combined with a ''write''/''Save'' instruction are not very frequently seen on personal/home machines. (Let''s leave out web cache and /temp.) But even on the servers that I am running, the gigabytes of user data do not change very much; seen as percentage of overall data. Most of the 200.000 files that the users have remain unmodified for ages. Office files do change, but also not much faster than the users can type ;) . Web content changes rarely, style sheets and icons remain unmodified close to forever. The largest changes come with system/software upgrades. (One might even discuss to exclude these from CDP, and rather automate a snapshot before; in case of a problem thereafter. But that is not my topic here and now.) Also, the granularity of the ''backups'' does not really have to be 100%. If - for reasons I can not imagine - a certain file would be marked for ''save'' thrice in a single second, of course you don''t need all the states. You do have the state at the start of that one second (to which you can roll), as well as the state at the end of that second (to which you can roll just as well; and you can even roll back and forward). I can hardly imagine a datafile to which one would want to roll, which was invalid at the start of that second, is invalid in the end, but was valid for some milliseconds in between. (How could one know about this intermediate correctness, would have to be asked.) Outside of databases, a valid state once per 10 seconds is probably even overdone. Don''t forget: even if you deleted the file, it will still be there. If you ''save'' a file, make a change, ''save'' again, make a mistake and ''save'' again, notice you made a mistake ... and all this within 10 seconds! ... you will still have the state at the begin of the 10 seconds, as well as the state at the end of those 10 seconds. 10 seconds are a hell of a lot of time to calculate and store an incremental difference. Of a single file. Whereas in a TimeMachine, 10 seconds can be a hell of a short time. Plus the huge overhead there, because you need to poll regularly, eventually on a much too high level, which files have been changed. Actually, chances are none at all has changed (at least in the /home/ of the user, even in the /home of the user*s*). Once it is event driven, ''no change'' means no activity at all. Once it is event-driven, and you have 3 changes in 10 seconds, I am pretty sure that all states can be handled without much trouble. Uwe
On Wed, Feb 27, 2008 at 01:45:41AM +0800, Uwe Dippel wrote:> Sorry, I don''t understand any of this. But I never pretended I did.Well, if you want some feature then you should understand what it is. Sure "continuous data protection" sounds real good, but you have to understand that any CDP solution has to have knowledge of, or even be driven by your applications -- otherwise CDP isn''t really. This is explained below.> My post was on something else: > In principle we have three types of write; atomic view, please:"atomic view"?> 1. Create. The new file needs to be written only, no backup/CDP > needed; identical to any conventional system. > 2. Edit/Modify. Here we need to store some incremental/differential > file content. rsync-like, that is.The rub is this: how do you know when a file edit/modify has completed? The answer is: it depends on what application we''re talking about!> 3. Remove. Also this is similar to the conventional system, except > that the files need to be retired and the blocks *not* be marked as > ''available''.If an application has many files then an "edit/modify" may include updates and/or removals of more than one file. So once again: how do you know when an edit/modify has completed? The answer is still the same. My point is this: because the interesting times at which to take checkpoints are application-specific, we can''t have a useful application-independent CDP solution. An application-independent CDP solution would not necessarily (not likely!) produce checkpoints that are safe to restore to. If you don''t know whether it''s safe to restore to a given checkpoint, and finding out is "hard", then what use is that checkpoint? And if you know it isn''t safe then the checkpoint is truly useless -- it''ll just sit there, taking up space. CDP really must be an application feature. Using ZFS snapshots could certainly make it easier to implement app-level CDP, and having the ability to snapshot/clone at a finer granularity than datasets (e.g., per-file) would help too. But ZFS _alone_ cannot provide a useful CDP solution. Nico --
Can someone please point me to link, or just unambiguously say ''yes'' or ''no'' to my question, if ZFS could produce a snapshot of whatever type, initiated with a signal that in turn is derived from a change (edit) of a file; like inotify in Linux 2.6.13 and above. Hi Uwe, I wasn''t previously familiar with inotify, so I may be off here... But as I understand it, inotify generates asynchronous events, which something else consumes (e.g. A backup tool). I believe the asynchronous nature of inotify prevent it from enabling “true” CDP. i.e. It would enable very frequent backups, but there still may be rewrites occurring before the first async event is delivered & processed. But based on your later comments... I think you''re just looking for very frequent backups, but not necessarily capturing every unique file version? You might want to look at the information we''ve starting posting about ADM (an HSM). There are two general use cases for ADM: a backup solution, and a disk extender. ADM will be using a subset of DMAPI to monitor file system activity. After skimming some brief info on inotify, I believe DMAPI is similar to inotify. ADM will be using this receive file modification events (among other event types), which based on policy will trigger archive requests to tape and/or disk archives. Note that ADM will be only archiving whole files (not incremental just the incremental changes). Additionally, since its an HSM, archived files may (based on policy, etc) be released from the file system. This is the “disk extender” part. Think of it as an “under the covers truncate” that frees the disk space. When the file data is accessed in the future, events trigger ADM to stage the file back in from the archives. Users would notice a delay (as it is staged in), but would not have to take explicit action to get the file data resident again. Releasing files will of course be optional. ADM could provide frequent backups, if configured to make archives soon after file modifications. Since we archive the whole file this would not be not appropriate for large files with frequent small changes. Also, frequent backups would only be appropriate for disk archiving (due to tape load times and tape wear). Keep in mind that CDP is not the design center here. If configured to approach CDP behavior on rapidly changing filesystem, one can imagine it hammering a filesystem and still not keeping up. Also, ADM archives are very different from ZFS snapshots. We have not yet defined how a user would explicitly access a specific archive. The expectation is, we''ll provide a way to see all the versions we have for a file, and the user can tell us to either restore it over the current contents of the file, or restore to a new file. http://opensolaris.org/os/project/adm/WhatisADM/ -Joe _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
> Can someone please point me to link, or just > unambiguously say ''yes'' or ''no'' to my question, if > ZFS could produce a snapshot of whatever type, > initiated with a signal that in turn is derived from > a change (edit) of a file; like inotify in Linux > 2.6.13 and above.Hi Uwe, As I understand it, inotify generates asynchronous events, which something else comsumes (e.g. A backup tool). I believe the asynchronous nature of inotify prevent it from enabling ?true? CDP. i.e. It would enable very frequent backups, but there still may be rewrites occurring before the first async event is delivered & processed. But based on your later comments... I think you''re just looking for frequent backups, not necessarily capturing every unique file version. You might want to look at the information we''ve starting posting about ADM (an HSM). There are two general use cases for ADM: a backup solution, and a disk extender. ADM will be using a subset of DMAPI to monitor file system activity. After skimming some brief info on inotify, I believe it is similar to DMAPI. ADM will be using DMAPI receive file modification events (among other event types), which based on policy will trigger archive requests to tape and/or disk archives. ADM will be only archiving whole files (not incremental just the incremental changes). ADM could provide frequent backups, if configured to make archives soon after file modifications. Since we archive the whole file this would not be not appropriate for large files with frequent small changes. Also, frequent backups would be more appropriate for disk archiving (due to tape load times and tape wear). Additionally, since its an HSM, archived files may (based on policy, etc) be released from the file system. This is the ?disk extender? part. Think of it as an ?under the covers truncate? that frees the disk space. When the file data is accessed in the future, events trigger ADM to stage the file back in from the archives. Users would notice a delay (as it is staged in), but would not have to take explicit action to get the file data resident again. Keep in mind that CDP is not the design center here. If configured to approach CDP behavior on rapidly changing filesystem, one can imagine ADM hammering a filesystem and still not keeping up. Also, ADM archives are very different from ZFS snapshots. We have not yet defined how a user would explicitly access a specific archive. The expectation is, we''ll provide a way to see all the versions we have for a file, and the user can tell us to either restore it over the current contents of the file, or restore to a new file. http://opensolaris.org/os/project/adm/WhatisADM/ (my apologies if this shows up multiple times ? I tried replying to the email alias and it just said ?An HTML attachment was scrubbed?) -Joe This message posted from opensolaris.org
> atomic view?Your post was on the gory details on how ZFS writes. "Atomic View" here is, that ''save'' of a file is an ''atomic'' operation: at one moment in time you click ''save'', and some other moment in time it is done. It means indivisible, and from the perspective of the user this is how it ought to look.> The rub is this: how do you know when a file edit/modify has completed?Not to me, I''m sorry, this is task of the engineer, the implementer. (See ''atomic'', as above.) It would be a shame if a file system never knew if the operation was completed.> If an application has many files then an "edit/modify" may include > updates and/or removals of more than one file. So once again: how do > you know when an edit/modify has completed?So an ''edit'' fires off a few child processes to do this and that and then you forget about them, hoping for them to do a proper job. Oh, this gives me confidence ;) No, seriously, let''s look at some applications: A. User works in Office (Star-Office, sure!) and clicks ''Save'' for a current work before making major modifications. So the last state of the document (odt) is being stored. Currently we can set some Backup option to be done regularly. Meaning that the backup could have happened at the very wrong moment; while saving the state on each user request ''Save'' is much better. B. A bunch of e-mails are read from the Inbox and stored locally (think Maildir). The user sees the sender, doesn''t know her, and deletes all of them. Of course, the deletion process will have fired up the CDP-engine (''event'') and retire the files instead of deletion. So when the sender calls, and the user learns that he made a big mistake, he can roll back to before the deletion (event). C. (Sticking with /home/) I agree with you, that the rather continuous changes of the dot-files and dot-directories in the users HOME that serve JDS, and many more, do eventually not necessarily allow to reconstitute a valid state of the settings at all and any moment. Still, chances are high, that they will. In the worst case, the unlucky user can roll back to when he last took a break, if only for grabbing another coffee, because it took a minute, the writes (see above) will hopefully have completed. "oh, s***", already messed up the settings? Then try to roll back to lunch break. Works? Okay! But when you roll back to lunch break, where is the stuff done in between? The backup solution means that they are lost. The event-driven (CDP) not: you can roll over all the states of files or directories between lunch break and recover the third latest version of your tendering document (see above), within the settings of the desktop that were valid this morning. Maybe SUN can''t do this, but wait for Apple, and OSX10-dot-something (using ZFS as default!) will know how to do it. (And they probably also know, when their ''writes'' are done.) Uwe This message posted from opensolaris.org
Are you indicating that the filesystem know''s or should know what an application is doing?? It seems to me that to achieve what you are suggesting, that''s exactly what it would take. Or, you are assuming that there are no co-dependent files in applications that are out there... Whichever the case, I''m confused...! Unless you are perhaps suggesting perhaps an IOCTL that an application could call to indicate "I''m done for this round, please snapshot" or something to that effect. Even then, I''m still confused as to how I would do anything much useful with this over and above, say, 1 minute snapshots. Nathan. Uwe Dippel wrote:>> atomic view? > > Your post was on the gory details on how ZFS writes. "Atomic View" here is, that ''save'' of a file is an ''atomic'' operation: at one moment in time you click ''save'', and some other moment in time it is done. It means indivisible, and from the perspective of the user this is how it ought to look. > >> The rub is this: how do you know when a file edit/modify has completed? > > Not to me, I''m sorry, this is task of the engineer, the implementer. (See ''atomic'', as above.) > It would be a shame if a file system never knew if the operation was completed. > >> If an application has many files then an "edit/modify" may include >> updates and/or removals of more than one file. So once again: how do >> you know when an edit/modify has completed? > > So an ''edit'' fires off a few child processes to do this and that and then you forget about them, hoping for them to do a proper job. > Oh, this gives me confidence ;) > > No, seriously, let''s look at some applications: > > A. User works in Office (Star-Office, sure!) and clicks ''Save'' for a current work before making major modifications. So the last state of the document (odt) is being stored. Currently we can set some Backup option to be done regularly. Meaning that the backup could have happened at the very wrong moment; while saving the state on each user request ''Save'' is much better. > > B. A bunch of e-mails are read from the Inbox and stored locally (think Maildir). The user sees the sender, doesn''t know her, and deletes all of them. Of course, the deletion process will have fired up the CDP-engine (''event'') and retire the files instead of deletion. So when the sender calls, and the user learns that he made a big mistake, he can roll back to before the deletion (event). > > C. (Sticking with /home/) I agree with you, that the rather continuous changes of the dot-files and dot-directories in the users HOME that serve JDS, and many more, do eventually not necessarily allow to reconstitute a valid state of the settings at all and any moment. Still, chances are high, that they will. In the worst case, the unlucky user can roll back to when he last took a break, if only for grabbing another coffee, because it took a minute, the writes (see above) will hopefully have completed. "oh, s***", already messed up the settings? Then try to roll back to lunch break. Works? Okay! But when you roll back to lunch break, where is the stuff done in between? The backup solution means that they are lost. The event-driven (CDP) not: you can roll over all the states of files or directories between lunch break and recover the third latest version of your tendering document (see above), within the settings of the desktop that were valid this morning. > > Maybe SUN can''t do this, but wait for Apple, and OSX10-dot-something (using ZFS as default!) will know how to do it. (And they probably also know, when their ''writes'' are done.) > > Uwe > > > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[i]I think you''re just looking for frequent backups, not necessarily capturing every unique file version.[/i] Thanks for your reply, Joe, but this is not my intention. I agree, that my arguments here look like moving targets. They simply developed along the lines of discussion. I''d still target every unique file version. Of course, not the transient ones, only those versions that have been written completely to disk. We will for a looong time not be able to reconstitute each and any moment in time. Though I am pretty sure, we can achieve a reconstitution of each and any moment of a completed write operation. If Nico was correct, the whole of ZFS wouldn''t make sense. If Nico was correct, even with ''the other operating system'' data would frequently be lost. Just think of a crash, a power outage without UPS: We don''t know the states of the files, but in 99.9% of the cases, the states of the files on the hard drive allow for a proper reboot. Meaning, AFAICS, that the state of files on a hard drive is usually consistent. Even with VFAT, or UFS. When I do very frequent backups (once per minute, e.g.), I get a lot of overhead, metadata, system activity; on almost all unmodified files. And still, I might miss out a relevant change. I was arguing in the other post that once I do very very frequent backups (once per second, e.g.) I will be fine, because I have the state before and after that second. Even *true* CDP would probably not require that intermediate state (again, aside from some specific applications, like databases; but that is solved within the applications), which also might not have been completely written to the drive. This is - I understand - where Nico is in agreement with me. Any completed write needs to be CDP-ed. And here we reach square one: while all those inotify-s and file_events_notification are needed for TimeMachine, my fear is still that they work on too high a level, need too many resources. As I wrote I have no clue about the internals of ZFS, but was hoping the file system itself could do all the necessary.> If configured to approach CDP behavior on rapidly changing filesystem, one can imagine > ADM hammering a filesystem and still not keeping up.Again, too frequent polling is wasting resources. As long as we have the notion of time-induced backups, we''re lost in any case. But even polling a flag and getting into action is wastage. Again, probably the file system itself needs to know how and perform the right action on its own. Uwe This message posted from opensolaris.org
Nathan Kroenert <Nathan.Kroenert at Sun.COM> wrote:> Are you indicating that the filesystem know''s or should know what an > application is doing??Maybe "snapshot file whenever a write-filedescriptor is closed" or somesuch? - Marcus
Uwe Dippel <udippel at gmail.com> writes:> Any completed write needs to be CDP-ed.And that is the rub, precisely. There is nothing in the app <-> kernel interface currently that indicates that a write has completed to a state that is meaningful to the application.
Uwe Dippel wrote:>> atomic view? >> > > Your post was on the gory details on how ZFS writes. "Atomic View" here is, that ''save'' of a file is an ''atomic'' operation: at one moment in time you click ''save'', and some other moment in time it is done. It means indivisible, and from the perspective of the user this is how it ought to look. > > >> The rub is this: how do you know when a file edit/modify has completed? >> > > Not to me, I''m sorry, this is task of the engineer, the implementer. (See ''atomic'', as above.) > It would be a shame if a file system never knew if the operation was completed. > >This is the consistency problem. It isn''t enough to know a write() completed, you must also know that a group of write()s leaves the file in a state which is consistent for the application.>> If an application has many files then an "edit/modify" may include >> updates and/or removals of more than one file. So once again: how do >> you know when an edit/modify has completed? >> > > So an ''edit'' fires off a few child processes to do this and that and then you forget about them, hoping for them to do a proper job. > Oh, this gives me confidence ;) > > No, seriously, let''s look at some applications: > > A. User works in Office (Star-Office, sure!) and clicks ''Save'' for a current work before making major modifications. So the last state of the document (odt) is being stored. Currently we can set some Backup option to be done regularly. Meaning that the backup could have happened at the very wrong moment; while saving the state on each user request ''Save'' is much better. >StarOffice can record changes. So you should never lose a change, no? Other editors and office suites have similar features. Some editors even keep backup copies of modified documents.> B. A bunch of e-mails are read from the Inbox and stored locally (think Maildir). The user sees the sender, doesn''t know her, and deletes all of them. Of course, the deletion process will have fired up the CDP-engine (''event'') and retire the files instead of deletion. So when the sender calls, and the user learns that he made a big mistake, he can roll back to before the deletion (event). >SOX compliance? ;-)> C. (Sticking with /home/) I agree with you, that the rather continuous changes of the dot-files and dot-directories in the users HOME that serve JDS, and many more, do eventually not necessarily allow to reconstitute a valid state of the settings at all and any moment. Still, chances are high, that they will. In the worst case, the unlucky user can roll back to when he last took a break, if only for grabbing another coffee, because it took a minute, the writes (see above) will hopefully have completed. "oh, s***", already messed up the settings? Then try to roll back to lunch break. Works? Okay! But when you roll back to lunch break, where is the stuff done in between? The backup solution means that they are lost. The event-driven (CDP) not: you can roll over all the states of files or directories between lunch break and recover the third latest version of your tendering document (see above), within the settings of the desktop that were valid this morning. > >Actually, there is case where you wouldn''t want this enabled for $HOME, in general. I use a browser every day. Actually I use several browsers every day. Each browser has a cache located somewhere in my home directory and the cache is managed so that it won''t grow very large. With CDP, I would fill my disk in a week or less, just by caching everything on the internet that I pass by. Similarly, I have an e-mail account that is pop-based and tends to collect large amounts of spam, which due to some irritating circumstances, I can''t remotely filter. I *really* don''t want to fill up my disk with enlargement spam. The only thing that would get larger is my disk space requirement :-)> Maybe SUN can''t do this, but wait for Apple, and OSX10-dot-something (using ZFS as default!) will know how to do it. (And they probably also know, when their ''writes'' are done.) >I use firefox and thunderbird on my mac... so I guess I would fill up my disk with the internet and spam ;-/ -- richard
On Wed, Feb 27, 2008 at 05:54:29AM +0200, Marcus Sundman wrote:> Nathan Kroenert <Nathan.Kroenert at Sun.COM> wrote: > > Are you indicating that the filesystem know''s or should know what an > > application is doing?? > > Maybe "snapshot file whenever a write-filedescriptor is closed" or > somesuch?Again. Not enough. Some apps (many!) deal with multiple files.
On Tue, Feb 26, 2008 at 06:34:04PM -0800, Uwe Dippel wrote:> > The rub is this: how do you know when a file edit/modify has completed? > > Not to me, I''m sorry, this is task of the engineer, the implementer. > (See ''atomic'', as above.) It would be a shame if a file system never > knew if the operation was completed.The filesystem knows if a filesystem operation completed. It can''t know application state. You keep missing that.> > If an application has many files then an "edit/modify" may include > > updates and/or removals of more than one file. So once again: how do > > you know when an edit/modify has completed? > > So an ''edit'' fires off a few child processes to do this and that and > then you forget about them, hoping for them to do a proper job. Oh, > this gives me confidence ;)You''d rather the filesystem guess application state than have the app tell it? Weird. Your other alternative -- saving a history of every write -- doesn''t work because you can''t tell what point in time is safe to restore to.> No, seriously, let''s look at some applications: > > A. User works in Office (Star-Office, sure!) and clicks ''Save'' for a > current work before making major modifications. So the last state of > the document (odt) is being stored. Currently we can set some Backup > option to be done regularly. Meaning that the backup could have > happened at the very wrong moment; while saving the state on each user > request ''Save'' is much better.So modify the office suite to call a new syscall that says "I''m internally consistent in all these files" and boom, the filesystem can now take a useful snapshot.> B. A bunch of e-mails are read from the Inbox and stored locally > (think Maildir). The user sees the sender, doesn''t know her, and > deletes all of them. Of course, the deletion process will have fired > up the CDP-engine (''event'') and retire the files instead of deletion. > So when the sender calls, and the user learns that he made a big > mistake, he can roll back to before the deletion (event).Now think of an application like this but which uses, say, SQLite (e.g., Firefox 3.x, Thunderbird, ...). The app might never close the database file, just fsync() once in a while. The DB might have multiple files (in the SQLite case that might be multiple DBs ATTACHed into one "database connection"). Also, an fsync of a SQLite journal file is not as useful to CDP as an fsync() of a SQLite DB proper. Now add any of a large number of databases and apps to the mix and forget it -- the heuristics become impossible or mostly useless.> C. (Sticking with /home/) I agree with you, that the rather continuous > changes of the dot-files and dot-directories in the users HOME that > serve JDS, and many more, do eventually not necessarily allow to > reconstitute a valid state of the settings at all and any moment. > Still, chances are high, that they will. In the worst case, the"Chances"? So what, we tell the user try restoring to this snapshot, login again and if stuff doesn''t work, then try another snapshot? What if the user discovers too late that the selected snapshot was inconsistent and by then they''ve made other changes?> unlucky user can roll back to when he last took a break, if only for > grabbing another coffee, because it took a minute, the writes (seeThat sounds mighty painful. I''d rather modify some high-profile apps to tell the filesystem that their state is consistent, so take a snapshot.> Maybe SUN can''t do this, but wait for Apple, and OSX10-dot-something > (using ZFS as default!) will know how to do it. (And they probably > also know, when their ''writes'' are done.)I''m giving you the best answer -- modify the apps -- and you reject it. Given how many important apps Apple controls it wouldn''t surprise me if they did what I suggest. We should do it too. But one step at a time. We need to setup a project, gather requirements, design a solution, ... And since the solution will almost certainly entail modifications to apps where heuristics won''t help, well, I think this would be a project with fairly wide scope, which means it likely won''t go fast. Nico --
It occurred to me that we are likely missing the point here because Uwe is thinking of this as a One User on a System sort of perspective, whereas most of the rest of us are thinking of it from a ''Solaris'' perspective, where we are typically expecting the system to be running many applications / DB''s / users all at the same time. In Uwe''s use cases thus far, it seems that he is interested in only the simple single user style applications, if I''m not mistaken, so he''s not considering the consequences of what it *really* means to have CDP in the way he wishes. Uwe - am I close here? Nathan. Nicolas Williams wrote:> On Tue, Feb 26, 2008 at 06:34:04PM -0800, Uwe Dippel wrote: >>> The rub is this: how do you know when a file edit/modify has completed? >> Not to me, I''m sorry, this is task of the engineer, the implementer. >> (See ''atomic'', as above.) It would be a shame if a file system never >> knew if the operation was completed. > > The filesystem knows if a filesystem operation completed. It can''t know > application state. You keep missing that. > >>> If an application has many files then an "edit/modify" may include >>> updates and/or removals of more than one file. So once again: how do >>> you know when an edit/modify has completed? >> So an ''edit'' fires off a few child processes to do this and that and >> then you forget about them, hoping for them to do a proper job. Oh, >> this gives me confidence ;) > > You''d rather the filesystem guess application state than have the app > tell it? Weird. Your other alternative -- saving a history of every > write -- doesn''t work because you can''t tell what point in time is safe > to restore to. > >> No, seriously, let''s look at some applications: >> >> A. User works in Office (Star-Office, sure!) and clicks ''Save'' for a >> current work before making major modifications. So the last state of >> the document (odt) is being stored. Currently we can set some Backup >> option to be done regularly. Meaning that the backup could have >> happened at the very wrong moment; while saving the state on each user >> request ''Save'' is much better. > > So modify the office suite to call a new syscall that says "I''m > internally consistent in all these files" and boom, the filesystem can > now take a useful snapshot. > >> B. A bunch of e-mails are read from the Inbox and stored locally >> (think Maildir). The user sees the sender, doesn''t know her, and >> deletes all of them. Of course, the deletion process will have fired >> up the CDP-engine (''event'') and retire the files instead of deletion. >> So when the sender calls, and the user learns that he made a big >> mistake, he can roll back to before the deletion (event). > > Now think of an application like this but which uses, say, SQLite (e.g., > Firefox 3.x, Thunderbird, ...). The app might never close the database > file, just fsync() once in a while. The DB might have multiple files > (in the SQLite case that might be multiple DBs ATTACHed into one > "database connection"). Also, an fsync of a SQLite journal file is not > as useful to CDP as an fsync() of a SQLite DB proper. Now add any of a > large number of databases and apps to the mix and forget it -- the > heuristics become impossible or mostly useless. > >> C. (Sticking with /home/) I agree with you, that the rather continuous >> changes of the dot-files and dot-directories in the users HOME that >> serve JDS, and many more, do eventually not necessarily allow to >> reconstitute a valid state of the settings at all and any moment. >> Still, chances are high, that they will. In the worst case, the > > "Chances"? So what, we tell the user try restoring to this snapshot, > login again and if stuff doesn''t work, then try another snapshot? What > if the user discovers too late that the selected snapshot was > inconsistent and by then they''ve made other changes? > >> unlucky user can roll back to when he last took a break, if only for >> grabbing another coffee, because it took a minute, the writes (see > > That sounds mighty painful. > > I''d rather modify some high-profile apps to tell the filesystem that > their state is consistent, so take a snapshot. > >> Maybe SUN can''t do this, but wait for Apple, and OSX10-dot-something >> (using ZFS as default!) will know how to do it. (And they probably >> also know, when their ''writes'' are done.) > > I''m giving you the best answer -- modify the apps -- and you reject it. > Given how many important apps Apple controls it wouldn''t surprise me if > they did what I suggest. We should do it too. But one step at a time. > We need to setup a project, gather requirements, design a solution, ... > And since the solution will almost certainly entail modifications to > apps where heuristics won''t help, well, I think this would be a project > with fairly wide scope, which means it likely won''t go fast. > > Nico
Hmm, two thoughts on this: 1. For anyone interested, didn''t VMS do something like this? Perhaps a look at its design and implementation would be useful here. 2. For the per-application issue, there are ways to handle that. First, make a ZFS api for providing file-level snapshots. Then, a library wrapper around the normal syscalls (open,close,read,write,etc) that invokes the zfs apis as needed. Either the wrapper is smart enough to know which app wants which behavior (perhaps even specializing also on the path of the file), or several libraries available for different tasks. Shove it/them in something like LD_PRELOAD and you''d be good to go. As for utility, I think this sort of thing would be fantastic in certain areas. If you can develop the feature set cheap enough, then it''s a real win. I haven''t touched ZFS''s internals (in code or even dev docs), so I don''t know what kind of work''s required to pull off file-level snapshots. -- H. Lally Singh Ph.D. Candidate, Computer Science Virginia Tech
[i]Even then, I''m still confused as to how I would do anything much useful with this over and above, say, 1 minute snapshots.[/i] Hi Nathan, I was hoping to be clear with my examples. Within that 1 minute the user has easily received the mail alert that 5 mails have arrived, has seen the sender and deleted them. Without any trigger of some snapshot, or storage of that state while the messages were actually on the drive. No recovery possible. One minute is much too long. Taking the average reaction time of users, we cannot expect, on the other hand, that the user is able to perform more than two operations within less than a second (receiving the notice, recognising the sender, clicking ''Delete''). On the other hand, one minute is much too frequently w.r.t. efficient usage of resources. The normal situation on a workstation within 1 minute difference in time is, that the file(s) on which the user works, are unmodified. It might please the vendors of hardware and storage space to try a snapshot once per minute, but normally, the actual change content will be zero. Logical consequence: If one minute is much too long w.r.t. recovery and at the same time too short for scheduled snapshots, the whole thing is based on wrong premises. In this case, the wrong assumption that scheduled snapshots could serve the intended purpose of a versioning system comprising all relevant versions. As much as ZFS is revolutionary, it is far away from being the ''ultimate file system'', if it doesn''t know how to handle event-driven snapshots (I don''t like the word), backups, versioning. As long as a high-level system utility needs to be invoked by a scheduler for these features (CDP), and - this is relevant - *ZFS does not support these functionalities essentially different from FAT or UFS*, the days of ZFS are counted. Sooner or later, and I bet it is sooner, someone will design a file system (hardware, software, Cairo) to which the tasks of retiring files, as well as creating versions of modified files, can be passed down, together with the file handlles. No need to believe me. But remember, you read it here first. Uwe This message posted from opensolaris.org
Uwe, I think you are assuming that zfs is cast in stone; features are added to ZFS almost on a weekly basis. If there is demand for a certain feature then at some point resources may be made available. What form would you want file versioning to take? I immensely disliked VMS ";X" notation for files. I''m not sure how this worked (was it transactional and did files only appear as "file;N+1" after they had been completely written and closed? How would such snapshots appear and where? (Again, I disliked the "file;X" notation and the fact that a manual purge was required). Casper
Nicolas Williams <Nicolas.Williams at sun.com> wrote:> On Wed, Feb 27, 2008 at 05:54:29AM +0200, Marcus Sundman wrote: > > Nathan Kroenert <Nathan.Kroenert at Sun.COM> wrote: > > > Are you indicating that the filesystem know''s or should know what > > > an application is doing?? > > > > Maybe "snapshot file whenever a write-filedescriptor is closed" or > > somesuch? > > Again. Not enough. Some apps (many!) deal with multiple files.So what? Why would every file-snapshot have to be a file that''s valid for the application(s) using it? (Certainly zfs snapshots don''t provide that property either, nor any other backup-related system I''ve seen.) - Marcus
Marcus Sundman wrote:> Nicolas Williams <Nicolas.Williams at sun.com> wrote: >> On Wed, Feb 27, 2008 at 05:54:29AM +0200, Marcus Sundman wrote: >>> Nathan Kroenert <Nathan.Kroenert at Sun.COM> wrote: >>>> Are you indicating that the filesystem know''s or should know what >>>> an application is doing?? >>> Maybe "snapshot file whenever a write-filedescriptor is closed" or >>> somesuch? >> Again. Not enough. Some apps (many!) deal with multiple files. > > So what? Why would every file-snapshot have to be a file that''s valid > for the application(s) using it? (Certainly zfs snapshots don''t provide > that property either, nor any other backup-related system I''ve seen.)If it isn''t how does the user or application know that is safe to use that file ? Is it okay to provide a snapshot of a file that is corrupt and will cause further more serious data corruption in the application ? -- Darren J Moffat
Casper.Dik at Sun.COM wrote:> How would such snapshots appear and where? (Again, I disliked the "file;X" > notation and the fact that a manual purge was required). > >I agree about the '';x'' However (and I don''t know what the patents are in this area.) Something like what clearcase does (an invisible directory for each file full of the files history.) might be interesting. -Kyle> Casper > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >
On Wed, Feb 27, 2008 at 05:20:42AM -0500, Lally Singh wrote:> > 1. For anyone interested, didn''t VMS do something like this? Perhaps > a look at its design and implementation would be useful here.IBM MVS had generations. Each rewrite of a file created a new generation of that file. Referential integrity is a more complex issue, of course. -- -Gary Mills- -Unix Support- -U of M Academic Computing and Networking-
Kyle McDonald wrote:> Casper.Dik at Sun.COM wrote: >> How would such snapshots appear and where? (Again, I disliked the "file;X" >> notation and the fact that a manual purge was required). >> >> > I agree about the '';x'' > > However (and I don''t know what the patents are in this area.) Something > like what clearcase does (an invisible directory for each file full of > the files history.) might be interesting.What like a .zfs/snapshot directory :-) -- Darren J Moffat
Darren J Moffat <Darren.Moffat at Sun.COM> wrote:> Marcus Sundman wrote: > > Nicolas Williams <Nicolas.Williams at sun.com> wrote: > >> On Wed, Feb 27, 2008 at 05:54:29AM +0200, Marcus Sundman wrote: > >>> Nathan Kroenert <Nathan.Kroenert at Sun.COM> wrote: > >>>> Are you indicating that the filesystem know''s or should know what > >>>> an application is doing?? > >>> Maybe "snapshot file whenever a write-filedescriptor is closed" or > >>> somesuch? > >> Again. Not enough. Some apps (many!) deal with multiple files. > > > > So what? Why would every file-snapshot have to be a file that''s > > valid for the application(s) using it? (Certainly zfs snapshots > > don''t provide that property either, nor any other backup-related > > system I''ve seen.) > > If it isn''t how does the user or application know that is safe to use > that file ?Unless the files contain some checksum or somesuch then I guess it doesn''t know it''s safe. However, that''s unavoidable unless the application can use a transaction-supporting fs api.> Is it okay to provide a snapshot of a file that is corrupt and will > cause further more serious data corruption in the application ?Well, apparently so. That''s what zfs snapshots do. That''s what all backup tools do. Sure it would be better to have full transactions in the fs api, but without that I don''t think it''s possible to do any better than "the file might be corrupt or it might not, good luck if your file format doesn''t support corruption-detection". - Marcus
On Feb 27, 2008, at 8:36 AM, Uwe Dippel wrote:> As much as ZFS is revolutionary, it is far away from being the > ''ultimate file system'', if it doesn''t know how to handle event- > driven snapshots (I don''t like the word), backups, versioning. As > long as a high-level system utility needs to be invoked by a > scheduler for these features (CDP), and - this is relevant - *ZFS > does not support these functionalities essentially different from > FAT or UFS*, the days of ZFS are counted. Sooner or later, and I bet > it is sooner, someone will design a file system (hardware, software, > Cairo) to which the tasks of retiring files, as well as creating > versions of modified files, can be passed down, together with the > file handlles.meh .. don''t believe all the marketing hype you hear - it''s good at what it''s good at, and is a constant WIP for many of the other features that people would like to hear .. but the "one ring to rule them all" - not quite yet .. as for the CDP issue - i believe the event driving would really have to happen below ZFS at the vnode or znode layer .. keep in mind that with the ZPL we''re still dealing with 30+ year old structures and methods (which is fine btw) in the VFS/Vnode layers .. a couple of areas i would look at (that i haven''t seen mentioned in this discussion) might be: - fop_vnevent .. or the equivalent (if we have one yet) for a znode - filesystem <-> door interface for event handling - auditing if you look at what some of the other vendors (eg: apple/timemachine) are doing - it''s essentially a tally of file change events that get dumped into a database and rolled up at some point .. if you plan on taking more immediate action on the file changes then i believe that you''ll run into latency (race) issues for synchronous semantics anyhow - just a thought from another who is constantly learning (being corrected, learning some more, more correction, etc ..) --- .je
Darren J Moffat wrote:> Kyle McDonald wrote: >> Casper.Dik at Sun.COM wrote: >>> How would such snapshots appear and where? (Again, I disliked the >>> "file;X" >>> notation and the fact that a manual purge was required). >>> >>> >> I agree about the '';x'' >> >> However (and I don''t know what the patents are in this area.) >> Something like what clearcase does (an invisible directory for each >> file full of the files history.) might be interesting. > > What like a .zfs/snapshot directory :-) >I was thinking more for a file by file access. ls might show just a file named ''foo'' but if you typed cd foo@@/ then ls you might see files in this new ''directory'' named 1 2 3 etc. that reperesent the snapshots of that file over time. I mentioned clearcase, but I was not suggesting implementing an SCM tool. Just using the existing UI for live access to previous versions. -Kyle
On Wed, 27 Feb 2008, Nicolas Williams wrote:>> >> Maybe "snapshot file whenever a write-filedescriptor is closed" or >> somesuch? > > Again. Not enough. Some apps (many!) deal with multiple files.Or more significantly, with multiple pages. When using memory mapping the application may close its file descriptor, but then the underlying file is updated in a somewhat random fashion as "dirty pages" are written to disk. It seems that this hypothesis is without merit. Bob =====================================Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
On Wed, 27 Feb 2008, Uwe Dippel wrote:>> As much as ZFS is revolutionary, it is far away from being the > ''ultimate file system'', if it doesn''t know how to handle > event-driven snapshotsUFS == Ultimate File System ZFS == Zettabyte File System Perhaps you have these two confused? ZFS does not lay claim to being the ultimate file system. You can provide great benefit to society if you invent and implement a filesystem with all that ZFS offers, plus your remarkable ideas, provided that the result still provides the performance that users expect and there is sufficient storage space available. Consider this to be your life''s mission. Bob =====================================Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
> UFS == Ultimate File System > ZFS == Zettabyte File Systemit''s a nit, but.. UFS != Ultimate File System ZFS != Zettabyte File System cheers, --justin
On Wed, Feb 27, 2008 at 03:57:29PM +0200, Marcus Sundman wrote:> Nicolas Williams <Nicolas.Williams at sun.com> wrote: > > On Wed, Feb 27, 2008 at 05:54:29AM +0200, Marcus Sundman wrote: > > > Nathan Kroenert <Nathan.Kroenert at Sun.COM> wrote: > > > > Are you indicating that the filesystem know''s or should know what > > > > an application is doing?? > > > > > > Maybe "snapshot file whenever a write-filedescriptor is closed" or > > > somesuch? > > > > Again. Not enough. Some apps (many!) deal with multiple files. > > So what? Why would every file-snapshot have to be a file that''s valid > for the application(s) using it? (Certainly zfs snapshots don''t provide > that property either, nor any other backup-related system I''ve seen.)With CDP you''d have thousands (and possibly many more still) snapshots a day to choose from when restoring. With backups you get to quiesce the apps/system, and you don''t run them that often, with CDP the wohle point is that you don''t have to quiesce the system and it runs continuously. So I see a tremendous qualitative difference between CDP and snapshots/ backups. The question remains: how to pick a CDP snapshot to restore to? How do you even know which files are relevant to whatever problem you''re trying to solve via restoring to a CDP snapshot? I''m convinced that the answer is that we need new system calls by which apps can inform the system about the state of app-level filesystem transactions. Modify a few high-profile apps to support this and you''ve got a good chance to get momentum behind CDP (i.e., to get other less visible apps to be updated too, to get third parties to update their enterprise apps). Nico --
On Wed, Feb 27, 2008 at 10:33:13AM -0500, Kyle McDonald wrote:> Darren J Moffat wrote: > > Kyle McDonald wrote: > >> Casper.Dik at Sun.COM wrote: > >>> How would such snapshots appear and where? (Again, I disliked the > >>> "file;X" > >>> notation and the fact that a manual purge was required). > >>> > >> I agree about the '';x'' > > > > What like a .zfs/snapshot directory :-) > > > I was thinking more for a file by file access.Make it an extended attribute called .zfs/snapshot/.
Nicolas Williams wrote:> On Wed, Feb 27, 2008 at 10:33:13AM -0500, Kyle McDonald wrote: > >> Darren J Moffat wrote: >> >>> Kyle McDonald wrote: >>> >>>> Casper.Dik at Sun.COM wrote: >>>> >>>>> How would such snapshots appear and where? (Again, I disliked the >>>>> "file;X" >>>>> notation and the fact that a manual purge was required). >>>>> >>>>> >>>> I agree about the '';x'' >>>> >>> What like a .zfs/snapshot directory :-) >>> >>> >> I was thinking more for a file by file access. >> > > Make it an extended attribute called .zfs/snapshot/. > > >Maybe I''m not up on how extended attributes work, but I don''t see how that would let you review all the versions that file might have had. Use grep and diff on them like they''re regular files. etc. -Kyle
On Wed, Feb 27, 2008 at 12:57:12PM -0500, Kyle McDonald wrote:> Nicolas Williams wrote: > >Make it an extended attribute called .zfs/snapshot/. > > > Maybe I''m not up on how extended attributes work, but I don''t see how > that would let you review all the versions that file might have had. Use > grep and diff on them like they''re regular files. etc.man runat
On Wed, Feb 27, 2008 at 05:49:56PM +1100, Nathan Kroenert wrote:> It occurred to me that we are likely missing the point here because Uwe > is thinking of this as a One User on a System sort of perspective, > whereas most of the rest of us are thinking of it from a ''Solaris'' > perspective, where we are typically expecting the system to be running > many applications / DB''s / users all at the same time.I''m looking at it both ways. Either way one would want to know what snapshot is safe to restore to! Especially if there are _many_ snapshots automatically taken, as opposed to a few manually or manually scheduled snapshots/backups (one typically quiesces important apps before a backup starts). Nico --
Nicolas Williams wrote:> On Wed, Feb 27, 2008 at 12:57:12PM -0500, Kyle McDonald wrote: > >> Nicolas Williams wrote: >> >>> Make it an extended attribute called .zfs/snapshot/. >>> >>> >> Maybe I''m not up on how extended attributes work, but I don''t see how >> that would let you review all the versions that file might have had. Use >> grep and diff on them like they''re regular files. etc. >> > > man runat >Oh! Cool! Is that the only way to access those attributes? or just the one that''s most likely to work? I can see how for running commands it''d be useful, but for interactive use it''s too bad ''cd'' can''t work. or can it? I wasn''t able to get it to. -Kyle
On Wed, Feb 27, 2008 at 01:13:06PM -0500, Kyle McDonald wrote:> Nicolas Williams wrote: > >man runat > > > Oh! Cool! > > Is that the only way to access those attributes? or just the one that''s > most likely to work?man fsattr :)> I can see how for running commands it''d be useful, but for interactive > use it''s too bad ''cd'' can''t work. or can it? I wasn''t able to get it to.Er, good question! I think the shells would have to support it. A good question for Roland :)
Casper.Dik at Sun.COM wrote:> > (Again, I disliked the "file;X" > notation and the fact that a manual purge was required).You could set the number of revisions to keep; VMS would delete older ones. Michael -- Michael Schuster http://blogs.sun.com/recursion Recursion, n.: see ''Recursion''
Nicolas Williams wrote:> On Wed, Feb 27, 2008 at 01:13:06PM -0500, Kyle McDonald wrote: > >>Nicolas Williams wrote: >> >>>man runat >>> >> >>Oh! Cool! >> >>Is that the only way to access those attributes? or just the one that''s >>most likely to work? > > > man fsattr > > :) > > >>I can see how for running commands it''d be useful, but for interactive >>use it''s too bad ''cd'' can''t work. or can it? I wasn''t able to get it to. > > > Er, good question! I think the shells would have to support it. A good > question for Roland :)The shells don''t actually have to care: $ cd /tmp $ touch f1 $ runat f1 sh Now my shell is running in file f1''s extended attribute space. $ ls SUNWattr_ro SUNWattr_rw $
On Wed, Feb 27, 2008 at 12:31:09PM -0600, Chris Kirby wrote:> >Er, good question! I think the shells would have to support it. A good > >question for Roland :) > > The shells don''t actually have to care: > > $ cd /tmp > $ touch f1 > $ runat f1 shI know that works. But why start a new process when the shell could have a built-in (or mod to the cd built-in) that can do this?
Nicolas Williams wrote:> On Wed, Feb 27, 2008 at 12:31:09PM -0600, Chris Kirby wrote: > >>> Er, good question! I think the shells would have to support it. A good >>> question for Roland :) >>> >> The shells don''t actually have to care: >> >> $ cd /tmp >> $ touch f1 >> $ runat f1 sh >> > > I know that works. But why start a new process when the shell could > have a built-in (or mod to the cd built-in) that can do this? > >How was it MVFS could do this without any changes to the shells or any other programs? I ClearCase could ''grep FOO /dir1/dir2/file@@/main/*'' to see which version of ''file'' added FOO. (I think @@ was the special hidden key. It might have been something else though.) The shells accessed that path just like any other. ''ls'' didn''t show them, but if you accessed them they were there. -Kyle
Nicolas.Williams at Sun.COM
2008-Feb-27 18:47 UTC
[zfs-discuss] Can ZFS be event-driven or not?
Kyle McDonald wrote:> Nicolas Williams wrote: >> On Wed, Feb 27, 2008 at 12:31:09PM -0600, Chris Kirby wrote: >>> The shells don''t actually have to care: >>> >>> $ cd /tmp >>> $ touch f1 >>> $ runat f1 sh >>> >> >> I know that works. But why start a new process when the shell could >> have a built-in (or mod to the cd built-in) that can do this? >> >> > How was it MVFS could do this without any changes to the shells or any > other programs? > > I ClearCase could ''grep FOO /dir1/dir2/file@@/main/*'' to see which > version of ''file'' added FOO. > (I think @@ was the special hidden key. It might have been something > else though.) > > The shells accessed that path just like any other. ''ls'' didn''t show > them, but if you accessed them they were there. > > -Kyle > >Via interposers, most likely.
Nicolas Williams wrote:> On Wed, Feb 27, 2008 at 12:31:09PM -0600, Chris Kirby wrote: > >>>Er, good question! I think the shells would have to support it. A good >>>question for Roland :) >> >>The shells don''t actually have to care: >> >>$ cd /tmp >>$ touch f1 >>$ runat f1 sh > > > I know that works. But why start a new process when the shell could > have a built-in (or mod to the cd built-in) that can do this?Yep, that certainly could be done with just a few lines of code. I was just demonstrating that it could be done now, in an interactive session.
>On Wed, Feb 27, 2008 at 12:31:09PM -0600, Chris Kirby wrote: >> >Er, good question! I think the shells would have to support it. A good >> >question for Roland :) >> >> The shells don''t actually have to care: >> >> $ cd /tmp >> $ touch f1 >> $ runat f1 sh > >I know that works. But why start a new process when the shell could >have a built-in (or mod to the cd built-in) that can do this?Change all shells (and make it only in the ones we changed, maintain deltas unless we can convince upstream, yadayadayada). New features should not be available as commands. Why optimize when performance is not an issue? Casper
>Via interposers, most likely.It''s in the kernel so it didn''t need to interpose; it just has that functionality in the kernel modules. Not POSIX compliant, but that''s how it is. Casper
On Wed, Feb 27, 2008 at 9:36 PM, Uwe Dippel <udippel at gmail.com> wrote:> I was hoping to be clear with my examples. > Within that 1 minute the user has easily received the mail alert that 5 mails have arrived, has seen the sender and deleted them. Without any trigger of some snapshot, or storage of that state while the messages were actually on the drive. No recovery possible. One minute is much too long. Taking the average reaction time of users, we cannot expect, on the other hand, that the user is able to perform more than two operations within less than a second (receiving the notice, recognising the sender, clicking ''Delete'').Uwe, In this case, it is easier for the email appliction to _not_ delete the email but just move it to a time-delayed trash. This is not dissimilar to what gmail did which gives you 30 days to "regret" your deleting decision. You will find that a lot of such "protection" is best implemented at the application level, e.g. Oracle transaction logs, because the data loses their meaning further down the stack. At the FS layer, it is best to think about how you can support the application to do what it wants instead of doing it for the application. -- Just me, Wire ... Blog: <prstat.blogspot.com>
On Wed, Feb 27, 2008 at 10:42 PM, Marcus Sundman <sundman at iki.fi> wrote:> Darren J Moffat <Darren.Moffat at Sun.COM> wrote: > > Marcus Sundman wrote: > > > Nicolas Williams <Nicolas.Williams at sun.com> wrote: > > >> On Wed, Feb 27, 2008 at 05:54:29AM +0200, Marcus Sundman wrote: > > >>> Nathan Kroenert <Nathan.Kroenert at Sun.COM> wrote: > > >>>> Are you indicating that the filesystem know''s or should know what > > >>>> an application is doing?? > > >>> Maybe "snapshot file whenever a write-filedescriptor is closed" or > > >>> somesuch? > > >> Again. Not enough. Some apps (many!) deal with multiple files. > > > > > > So what? Why would every file-snapshot have to be a file that''s > > > valid for the application(s) using it? (Certainly zfs snapshots > > > don''t provide that property either, nor any other backup-related > > > system I''ve seen.) > > > > If it isn''t how does the user or application know that is safe to use > > that file ? > > Unless the files contain some checksum or somesuch then I guess it > doesn''t know it''s safe. However, that''s unavoidable unless the > application can use a transaction-supporting fs api.Checksums only tell you the data file is good. If you have a whole load of backups (one every nano-second) and none of them have a good checksum, you are still very screwed.> > Is it okay to provide a snapshot of a file that is corrupt and will > > cause further more serious data corruption in the application ? > > Well, apparently so. That''s what zfs snapshots do. That''s what all > backup tools do. Sure it would be better to have full transactions in > the fs api, but without that I don''t think it''s possible to do any > better than "the file might be corrupt or it might not, good luck if > your file format doesn''t support corruption-detection".A good backup practice increases (significantly) the likelihood of getting a usable backup. E.g. you quiesce Oracle before you start your backup to make sure that the datafiles you backup are consistent. Still, you are missing the point. What''s the point of backing up if you cannot use it for restoring your environment? -- Just me, Wire ... Blog: <prstat.blogspot.com>
"Wee Yeh Tan" <weeyeh at gmail.com> wrote:> On Wed, Feb 27, 2008 at 10:42 PM, Marcus Sundman <sundman at iki.fi> > wrote: > > Darren J Moffat <Darren.Moffat at Sun.COM> wrote: > > > Marcus Sundman wrote: > > > > Nicolas Williams <Nicolas.Williams at sun.com> wrote: > > > >> On Wed, Feb 27, 2008 at 05:54:29AM +0200, Marcus Sundman > > > >> wrote: > > > >>> Nathan Kroenert <Nathan.Kroenert at Sun.COM> wrote: > > > >>>> Are you indicating that the filesystem know''s or should > > > >>>> know what an application is doing?? > > > >>> Maybe "snapshot file whenever a write-filedescriptor is > > > >>> closed" or somesuch? > > > >> Again. Not enough. Some apps (many!) deal with multiple > > > >> files. > > > > > > > > So what? Why would every file-snapshot have to be a file that''s > > > > valid for the application(s) using it? (Certainly zfs snapshots > > > > don''t provide that property either, nor any other > > > > backup-related system I''ve seen.) > > > > > > If it isn''t how does the user or application know that is safe > > > to use that file ? > > > > Unless the files contain some checksum or somesuch then I guess it > > doesn''t know it''s safe. However, that''s unavoidable unless the > > application can use a transaction-supporting fs api. > > Checksums only tell you the data file is good. If you have a whole > load of backups (one every nano-second) and none of them have a good > checksum, you are still very screwed.True. However, this is equally true for zfs snapshots. If I undestood the concept of CDP correctly then each zfs snapshot would provide a subset of the set of all versions in the CDP database. Thus, CDP couldn''t possibly provide less protection than zfs snapshots (although it might be harder to find the right versions of files). So, if you think zfs snapshots provide enough protection then you can''t claim CDP doesn''t.> > > Is it okay to provide a snapshot of a file that is corrupt and > > > will cause further more serious data corruption in the > > > application ? > > > > Well, apparently so. That''s what zfs snapshots do. That''s what all > > backup tools do. Sure it would be better to have full transactions > > in the fs api, but without that I don''t think it''s possible to do > > any better than "the file might be corrupt or it might not, good > > luck if your file format doesn''t support corruption-detection". > > A good backup practice increases (significantly) the likelihood of > getting a usable backup. E.g. you quiesce Oracle before you start > your backup to make sure that the datafiles you backup are consistent.True for both ZFS snapshots and CDP, except that with CDP you don''t have to make the actual snapshot since that''s automated.> Still, you are missing the point. What''s the point of backing up if > you cannot use it for restoring your environment?I think you are missing the point if you think ZFS snapshots are capable of something CDP is not. Also, I though the author of the original message wasn''t particularly interested in restoring the environment, but more about restoring individual files. As a kind of version history, or filesystem undo if you will. Maybe I misunderstood him. - Marcus
[i]Consider this to be your life''s mission.[/i] Bob, I can do without this. Richard, [i]Actually I use several browsers every day. Each browser has a cache located somewhere in my home directory and the cache is managed so that it won''t grow very large. With CDP, I would fill my disk in a week or less, just by caching everything on the internet that I pass by.[/i] if you RTFT, you''d find that nobody ever was interested in temp files. [i]In Uwe''s use cases thus far, it seems that he is interested in only the simple single user style applications, if I''m not mistaken, so he''s not considering the consequences of what it *really* means to have CDP in the way he wishes. Uwe - am I close here?[/i] Nathan, you are not. Again, there''s nothing that I "wanted". I was only thinking. And I am a server person. Now, if I switch from the /export/home/userfoo/Documents (for Richard, who might be happier with UZFS-CDP than with the shots of TimeMachine), to a file server, do the arguments still hold, that 1. The application (NFS - sftp) does not know about the state of writing? 2. Obviously nobody sees anything in having access to all versions of a file stored there? In any case, my presentation at that enterprise-security related conference is done, the ''history'' of backups presented (not exactly my topic). I introduced the idea of versioning, and the (possible) advantages of having all versions, including the (possible) disadvantages (storage space, mentioned despite my doubts). I also pointed out the currently available software for near-CDP, and mentioned the discussion we have in here; started for one and only reason (see Subject): to confirm if ZFS can be instructed to produce a copy of each version of a file, initiated by some event instead of a scheduler. Somewhat to my surprise, my presentation was a good success, and Q&A was focused on the event-driven backups, what the technical problems were, etc. A good handful of people approached me later, being curious and fascinated by the idea to replace the backup scheduler with an event-driven creation of the versions. Therefore, to me the case is closed; my presentation done, on the successful side. Thanks to everyone who cared to answer, help, contribute in one way or another, Uwe This message posted from opensolaris.org
> A good handful of people approached me later, being > curious and fascinated by the idea to replace the > backup scheduler with an event-driven creation of the > versions.Uwe, I''m still struggling to decide if ADM is what you''re looking for. When you make comments like the one quoted above, I think ADM is a very practical choice for you. Even if it isn''t, the issues discussed here are what lead people to an ADM-like solution. Let me attempt to summarize the dilemmas as I see them, and point out the practicality of an ADM-like solution... * Application agnostic CDP cannot know when the file state is sane. For true CDP this essentially requires preserving the entire write stream, which is an enormous burden (in both storage capacity and system bandwidth). Presumably this burden is unacceptable except in niche cases. Basically: it works, but it hurts. * Application aware/driven CDP solves the file sanity challenge by being explicitly told by the app. But this will have an inherently limited market because it relies on application support. Basically: it works, but requires coordination rarely found outside monopoly owned stacks. * Traditional backup leaves exposure windows and doesn''t address the file sanity issue (unless there is a backup window, or specific assumptions) Basically: its easy because it overlooks so much. Unless you have a large budget, some compromises need to be made. IMO, ADM is a reasonable compromise for many. With ADM, backing up files is typically initiated at a specified time after file modification. For this discussion, think of it as: ?make a new backup anytime file data is stable for X amount of time?. There can be many policies for files with different usage patterns in a file system. These should be tailored to business value, anticipated modification frequency, etc. Here''s a few examples of policies one might set up: - Never backup files with /firefox/cache/ in the path. - Backup (to disk) the CEO''s Star-Office docs when they''re stable for 1 minute. - Backup (to disk) other user''s Star-Office docs when they''re stable for 5 minutes. - Backup (to disk) all other files when stable for 5 hours. - Make a second backup (to tape) of all files when they''re stable for 24 hours. Note how the file data stability time can ignorantly handle the file consistency issue. Pauses in file modification should generally occur when the data is consistent. If not, we''ll back it up again anyway after the next round of modifications. The overhead introduced by ADM is less than you might imagine... ADM/DMAPI can enable specific event types on a per-filesystem-object basis, so the versatility of the policies above does not come at the expense of excess chatter. ADM''s evaluation of a file is triggered by a change or close event. So we look when there is reason to be believe we have work to do. ADM has several benefits relevant to this discussion: - Automated management of the thousands/millions of backups. How many to keep, should they be migrated from disk to tape, etc. - Automated reclaiming & reuse of media used for backups. - No burden of maintaining entire write stream - No requirement for application support - For most file access patterns, we should make good guesses on when the data is consistent. If you''re willing to give up the ?last mile? requirement of CDP ADM is a fairly cheap way to give you a lot of what you want. Thoughts? (in ADM we use the term ?archive? but here I''m using the term ?backup? since that''s what you''re using) -Joe This message posted from opensolaris.org
On Thu, 28 Feb 2008, Uwe Dippel wrote:> > 1. The application (NFS - sftp) does not know about the state of writing?Sometimes applications know about the state of writing and sometimes they do not. Sometimes they don''t even know they are writing.> 2. Obviously nobody sees anything in having access to all versions of a file stored there?First it is necessary to determine what "version" means when it comes to a file. At the application level, the system presents a different view than what is actually stored on disk since the system uses several levels of write caching to improve performance. The only time that these should necessarily be the same is if the application uses a file descriptor to access the file (no memory mapping) and invokes fsync(). If memory mapping is used, the equivalent is msync() with the MS_SYNC option. Using fsync() or msync(MS_SYNC) blocks the application until the I/O is done. If a file is updated via memory mapping, then the data sent to the underlying file is based on the system''s virtual memory system so the actually data sent to disk may not be coherent at all. Bob =====================================Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
On Thu, Feb 28, 2008 at 04:05:50AM -0800, Uwe Dippel wrote:> Again, there''s nothing that I "wanted". I was only thinking. And I am > a server person. Now, if I switch from the > /export/home/userfoo/Documents (for Richard, who might be happier with > UZFS-CDP than with the shots of TimeMachine), to a file server, do the > arguments still hold, that > 1. The application (NFS - sftp) does not know about the state of > writing?The application isn''t NFS -- the application is whatever is running on the client. And yes, if we add syscalls by which apps can mark their fs transactions as such, then we''ll need a similar extension to NFSv4.> 2. Obviously nobody sees anything in having access to all versions of > a file stored there?Why obviously? Once you have versioned files/CDP/whatever and expose APIs by which to create and access these things, then why shouldn''t apps use this much like RCS, SCCS, ...? One possible answer: because CDP snapshots may be subject to automatic deletion when space is running low, whereas traditional version control systems aren''t.> [...] > reason (see Subject): to confirm if ZFS can be instructed to produce a > copy of each version of a file, initiated by some event instead of a > scheduler.You can write a program that uses the OpenSolaris / Solaris Nevada user-land file event notification API to locally monitor files and make copies, take snapshots, whatever. But this is not provided on the system, and it will be a lot coarser than you like (e.g., you won''t be able to distinguish dirty pages naturally being written to disk from calls to close(2) or fsync(2).> Somewhat to my surprise, my presentation was a good success, and Q&A > was focused on the event-driven backups, what the technical problems > were, etc. A good handful of people approached me later, being curious > and fascinated by the idea to replace the backup scheduler with an > event-driven creation of the versions. > > Therefore, to me the case is closed; my presentation done, on the > successful side.Well, it hasn''t been successful yet; once implementations appear that are actually useful, then it will have been successful. Although, in so far as you got people talking about high-level design issues then is has been a success, even if you don''t like the content of those discussions. To begin with, we (you) could certainly do a prototype that isn''t necessarily generally useful as anything other than a proof of concept. You could begin with something like what is described above and which matches your vision. Then you could add some library of fs transaction marking APIs and modify some sample apps. Compare the results. In both cases you must try popular, complex apps and must attempt to restore from CDP backups. Nico --
On Thu, Feb 28, 2008 at 07:55:45AM -0800, Joe Blount wrote:> * Application aware/driven CDP solves the file sanity challenge by > being explicitly told by the app. But this will have an inherently > limited market because it relies on application support. Basically: > it works, but requires coordination rarely found outside monopoly > owned stacks.I challenge the assumption that this has "an inherently limited market" -- if you get momentum for something like this then who knows, it might take off.
On Wed, 2008-02-27 at 13:43 -0500, Kyle McDonald wrote:> How was it MVFS could do this without any changes to the shells or any > other programs? > > I ClearCase could ''grep FOO /dir1/dir2/file@@/main/*'' to see which > version of ''file'' added FOO. > (I think @@ was the special hidden key. It might have been something > else though.)When I last used clearcase (on the order of 12 years ago) foo@@/ only worked within clearcase mvfs filesystems. It behaved as if the filesystem created a "foo@@" virtual directory for each real "foo" directory entry, but then filtered those names out of directory listings. Doing the same as an alternate "view" on snapshot space would be a simple matter of programming within ZFS, though the magic token/suffix to get you into version/snapshot space would likely not be POSIX compliant.. - Bill
Bill Sommerfeld wrote:> On Wed, 2008-02-27 at 13:43 -0500, Kyle McDonald wrote: > >> How was it MVFS could do this without any changes to the shells or any >> other programs? >> >> I ClearCase could ''grep FOO /dir1/dir2/file@@/main/*'' to see which >> version of ''file'' added FOO. >> (I think @@ was the special hidden key. It might have been something >> else though.) >> > > When I last used clearcase (on the order of 12 years ago) foo@@/ only > worked within clearcase mvfs filesystems. > > It behaved as if the filesystem created a "foo@@" virtual directory for > each real "foo" directory entry, but then filtered those names out of > directory listings. > > Doing the same as an alternate "view" on snapshot space would be a > simple matter of programming within ZFS, though the magic token/suffix > to get you into version/snapshot space would likely not be POSIX > compliant.. >Ahh. I suspected it should be ''possible'' to code it into ZFS. The reason it''s been left to runat instead seems to be POSIX compliance then? Maybe a FS level parameter could turn that processing on or off, and even allow the admin to redefine the ''@@'' to anything they wish? (VMS fans might like to set it to '';'' I suppose, but even then it wouldn''t be the same. ;) ) -Kyle> - Bill > > > > >
>I suspected it should be ''possible'' to code it into ZFS. > >The reason it''s been left to runat instead seems to be POSIX compliance >then?It could still have used "//" pathnames (those have a POSIX reserved special meaning though that somewhat complicates pathname composition). E.g., a pathname of the form //@@file could be interpreted, I think, as the attributes of "file" in the current directory. Casper
Kyle McDonald wrote:> Bill Sommerfeld wrote: >> On Wed, 2008-02-27 at 13:43 -0500, Kyle McDonald wrote: >> >>> How was it MVFS could do this without any changes to the shells or any >>> other programs? >>> >>> I ClearCase could ''grep FOO /dir1/dir2/file@@/main/*'' to see which >>> version of ''file'' added FOO. >>> (I think @@ was the special hidden key. It might have been something >>> else though.) >>> >> When I last used clearcase (on the order of 12 years ago) foo@@/ only >> worked within clearcase mvfs filesystems. >> >> It behaved as if the filesystem created a "foo@@" virtual directory for >> each real "foo" directory entry, but then filtered those names out of >> directory listings. >> >> Doing the same as an alternate "view" on snapshot space would be a >> simple matter of programming within ZFS, though the magic token/suffix >> to get you into version/snapshot space would likely not be POSIX >> compliant.. >> > Ahh. > > I suspected it should be ''possible'' to code it into ZFS. > > The reason it''s been left to runat instead seems to be POSIX compliance > then?Yes, we have runat for POSIX compliance. An earlier prototype of Solaris extended attributes utilized a /@/ syntax to enter enter xattr space. For example: /data/file1/@/ /data/file1/@/attr.1 ... or /data/dir1/@/ A readdir of /data/dir1 wouldn''t show the @ directory, but you could always request to enter it. This violated posix in a couple of ways. One we took away the @ filename and two you can''t have a directory on a file. It was a really nice model, and I still kind of wish we could have integrated it that way. -Mark
Mark Shellenbaum wrote:> Kyle McDonald wrote: >> Bill Sommerfeld wrote: >>> On Wed, 2008-02-27 at 13:43 -0500, Kyle McDonald wrote: >>> >>>> How was it MVFS could do this without any changes to the shells or >>>> any other programs? >>>> >>>> I ClearCase could ''grep FOO /dir1/dir2/file@@/main/*'' to see which >>>> version of ''file'' added FOO. >>>> (I think @@ was the special hidden key. It might have been >>>> something else though.) >>>> >>> When I last used clearcase (on the order of 12 years ago) foo@@/ only >>> worked within clearcase mvfs filesystems. >>> >>> It behaved as if the filesystem created a "foo@@" virtual directory for >>> each real "foo" directory entry, but then filtered those names out of >>> directory listings. >>> >>> Doing the same as an alternate "view" on snapshot space would be a >>> simple matter of programming within ZFS, though the magic token/suffix >>> to get you into version/snapshot space would likely not be POSIX >>> compliant.. >>> >> Ahh. >> >> I suspected it should be ''possible'' to code it into ZFS. >> >> The reason it''s been left to runat instead seems to be POSIX >> compliance then? > > Yes, we have runat for POSIX compliance. > > An earlier prototype of Solaris extended attributes utilized a /@/ > syntax to enter enter xattr space. For example: > > /data/file1/@/ > /data/file1/@/attr.1 > ... > or > /data/dir1/@/ > > A readdir of /data/dir1 wouldn''t show the @ directory, but you could > always request to enter it. > > This violated posix in a couple of ways. One we took away the @ > filename and two you can''t have a directory on a file. > > It was a really nice model, and I still kind of wish we could have > integrated it that way. >Why not resurrect the behavior, but default it to off, and leave it to the user to enable with a ZFS filesystem or pool attribute? -Kyle> -Mark
Bill Sommerfeld wrote:> > Doing the same as an alternate "view" on snapshot space would be a > simple matter of programming within ZFS, though the magic token/suffix > to get you into version/snapshot space would likely not be POSIX > compliant.. > >We already have a POSIX compliant file system for ZFS, implemented by the ZFS POSIX Layer (ZPL). We also have ZVols which don''t use the ZPL. Perhaps some enterprising soul could add another file system type to ZFS :-) Step right up! Invent something cool! Be the life of the party! Amaze your friends! -- richard
Is it possible to create a ZFS pool using a backing file created in xattr space? Bob =====================================Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Bob Friesenhahn wrote:> Is it possible to create a ZFS pool using a backing file created in > xattr space?Why would you want to do that ? I tried but could get it to work with the CLI. However it may be possible via the (private) libzfs function call interface. da64-x4500b-gmp03# cd /tmp da64-x4500b-gmp03# runat da64-x4500b-gmp03# touch silly da64-x4500b-gmp03# runat silly mkfile 64m pool_file_1 da64-x4500b-gmp03# runat silly zpool create silly `pwd`/pool_file_1 cannot open ''/tmp/pool_file_1'': No such file or directory Which is correct because it isn''t in /tmp -- Darren J Moffat