A new ARC case:> There is a long-standing RFE for zfs to be able to describe what has > changed between the snapshots of a dataset. To provide this > capability, we propose a new ''zfs diff'' sub-command. When run with > appropriate privilege the sub-command describes what file system level > changes have occurred between the requested snapshots. A diff between > the current version of the file system and one of its snapshots is > also supported. > > Five types of change are described: > > o File/Directory modified > o File/Directory present in older snapshot but not newer > o File/Directory present in newer snapshot but not older > o File/Directory renamed > o File link count changedhttp://arc.opensolaris.org/caselog/PSARC/2010/105/ Via c0t0d0s0.org
One really good use for zfs diff would be: as a way to index zfs send backups by contents. Nico --
zfs diff is incredibly cool.
On 30-3-2010 0:39, Nicolas Williams wrote:> One really good use for zfs diff would be: as a way to index zfs send > backups by contents. > > Nico >Any prevision about the release target? snv_13x? Bruno -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 3656 bytes Desc: S/MIME Cryptographic Signature URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100330/1202d8b2/attachment.bin>
On Mon, Mar 29, 2010 at 5:39 PM, Nicolas Williams <Nicolas.Williams at sun.com> wrote:> One really good use for zfs diff would be: as a way to index zfs send > backups by contents.Or to generate the list of files for incremental backups via NetBackup or similar. This is especially important for file systems will millions of files with relatively few changes. -- Mike Gerdts http://mgerdts.blogspot.com/
On 03/29/10 16:44, Mike Gerdts wrote:> On Mon, Mar 29, 2010 at 5:39 PM, Nicolas Williams > <Nicolas.Williams at sun.com> wrote: >> One really good use for zfs diff would be: as a way to index zfs send >> backups by contents. > > Or to generate the list of files for incremental backups via NetBackup > or similar. This is especially important for file systems will > millions of files with relatively few changes. >Or to say keep indexing files on your desktop.... This gives everyone a way to access the changes in a filesystem order (number of files changed) instead of order(number of files extant). - Bart -- Bart Smaalders Solaris Kernel Performance bart.smaalders at oracle.com http://blogs.sun.com/barts "You will contribute more with mercurial than with thunderbird."
On Mon, Mar 29, 2010 at 06:38:47PM -0400, David Magda wrote:> A new ARC case:I read this earlier this morning. Welcome news indeed! I have some concerns about the output format, having worked with similar requirements in the past. In particular: as part of the monotone VCS when reporting workspace changes and also as a consumer of similar-purpose output from rsync when building backup catalog databases of what changed each run. I''m not familiar with the ARC process; where should these concerns be directed so as to make a difference? [pun only slightly intended] At the risk of prompting discussion here rather than the right place.. These relate in several ways to the use of the name as the only identifier. For example, it''s not clear that the proposed output lets me tell which new filenames are new links to which existing file, or even whether added file names are new files or just new links. There will also need to be clear rules on output ordering, with respect to renames, where multiple changes have happened to renamed files. Some of these concerns might be better addressed with clearer examples / use cases. Consider the commmon case of a file having been replaced with another: say, an editor that renames the old file and creates and rewrites a new file with the same name. It may remove the old, or keep it as a "backup" and maybe delete the previous backup. Would this be reported as a series of renames and adds and deletes (tracking the node), or merely as a content change (tracking the name)? Now consider that the file may have had links. I''m concerned that the proposed output format does not represent this and similar cases well. I realise it''s not intended to convey all of the details of what changed, merely to flag which files should be checked for further information. I also think distinguishing content-change from attribute-change (e.g. chmod/chown) would be highly useful to potential consumers. -- Dan. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 194 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100330/04f1dc39/attachment.bin>
On Tue, Mar 30, 2010 at 12:37:15PM +1100, Daniel Carosone wrote:> There will also need to be clear rules on output ordering, with > respect to renames, where multiple changes have happened to renamed > files.Separately, but relevant in particular to the above due to the potential for races: what is the defined behaviour when diffing against a live filesystem (rather than a snapshot)? is there an implied snapshot (ie, diff based on content frozen at txg_id when started) or is thhe comparison done against a moving target? It''s not just a question of implementation if it can affect the output, especially if it can make it internally inconsistent. -- Dan. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 194 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100330/0de885f4/attachment.bin>
On 3/29/10 8:02 PM, Daniel Carosone wrote:> On Tue, Mar 30, 2010 at 12:37:15PM +1100, Daniel Carosone wrote: > >> There will also need to be clear rules on output ordering, with >> respect to renames, where multiple changes have happened to renamed >> files. >> > Separately, but relevant in particular to the above due to the > potential for races: what is the defined behaviour when diffing > against a live filesystem (rather than a snapshot)? > > is there an implied snapshot (ie, diff based on content frozen at > txg_id when started) or is thhe comparison done against a moving > target? > > It''s not just a question of implementation if it can affect the > output, especially if it can make it internally inconsistent. > > -- > Dan. >Yes, a snapshot is taken and removed once the compare is performed. -tim
On 03/30/10 12:44 PM, Mike Gerdts wrote:> On Mon, Mar 29, 2010 at 5:39 PM, Nicolas Williams > <Nicolas.Williams at sun.com> wrote: > >> One really good use for zfs diff would be: as a way to index zfs send >> backups by contents. >> > Or to generate the list of files for incremental backups via NetBackup > or similar. This is especially important for file systems will > millions of files with relatively few changes. >Or to generate the list of files for virus scanning! -- Ian.
> On Mon, Mar 29, 2010 at 5:39 PM, Nicolas Williams > <Nicolas.Williams at sun.com> wrote: > > One really good use for zfs diff would be: as a way to index zfs send > > backups by contents. > > Or to generate the list of files for incremental backups via NetBackup > or similar. This is especially important for file systems will > millions of files with relatively few changes.+1 The reason "zfs send" is so fast, is not because it''s so fast. It''s because it does not need any time to index and compare and analyze which files have changed since the last snapshot or increment. If the "zfs diff" command could generate the list of changed files, and you feed that into tar or whatever, then these 3rd party backup tools become suddenly much more effective. Able to rival the performance of "zfs send."