thr3ads.net - zfs discuss - [zfs-discuss] zfs diff [Mar 2010]

If this information is useful, please help other people find it:
Share via:

David Magda

2010-Mar-29 22:38 UTC

[zfs-discuss] zfs diff

A new ARC case:
> There is a long-standing RFE for zfs to be able to describe what has
> changed between the snapshots of a dataset.  To provide this
> capability, we propose a new ''zfs diff'' sub-command. 
When run with
> appropriate privilege the sub-command describes what file system level
> changes have occurred between the requested snapshots.  A diff between
> the current version of the file system and one of its snapshots is
> also supported.
>
> Five types of change are described:
>
> o    File/Directory modified
> o    File/Directory present in older snapshot but not newer
> o    File/Directory present in newer snapshot but not older
> o    File/Directory renamed
> o    File link count changed

http://arc.opensolaris.org/caselog/PSARC/2010/105/

Via c0t0d0s0.org

Nicolas Williams

2010-Mar-29 22:39 UTC

head link

[zfs-discuss] zfs diff

One really good use for zfs diff would be: as a way to index zfs send
backups by contents.

Nico
--

Nicolas Williams

2010-Mar-29 23:04 UTC

head link

[zfs-discuss] zfs diff

zfs diff is incredibly cool.

Bruno Sousa

2010-Mar-29 23:31 UTC

head link

[zfs-discuss] zfs diff

On 30-3-2010 0:39, Nicolas Williams wrote:> One really good use for zfs diff would be: as a way to index zfs send
> backups by contents.
>
> Nico
>   

Any prevision about the release target? snv_13x?

Bruno

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 3656 bytes
Desc: S/MIME Cryptographic Signature
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100330/1202d8b2/attachment.bin>

Mike Gerdts

2010-Mar-29 23:44 UTC

head link

[zfs-discuss] zfs diff

On Mon, Mar 29, 2010 at 5:39 PM, Nicolas Williams
<Nicolas.Williams at sun.com> wrote:> One really good use for zfs diff would be: as a way to index zfs send
> backups by contents.
Or to generate the list of files for incremental backups via NetBackup
or similar.  This is especially important for file systems will
millions of files with relatively few changes.

-- 
Mike Gerdts
http://mgerdts.blogspot.com/

Bart Smaalders

2010-Mar-30 00:50 UTC

head link

[zfs-discuss] zfs diff

On 03/29/10 16:44, Mike Gerdts wrote:> On Mon, Mar 29, 2010 at 5:39 PM, Nicolas Williams
> <Nicolas.Williams at sun.com>  wrote:
>> One really good use for zfs diff would be: as a way to index zfs send
>> backups by contents.
>
> Or to generate the list of files for incremental backups via NetBackup
> or similar.  This is especially important for file systems will
> millions of files with relatively few changes.
>
Or to say keep indexing files on your desktop....
This gives everyone a way to access the changes in a filesystem
order (number of files changed) instead of order(number of files extant).

- Bart


-- 
Bart Smaalders			Solaris Kernel Performance
bart.smaalders at oracle.com	http://blogs.sun.com/barts
"You will contribute more with mercurial than with thunderbird."

Daniel Carosone

2010-Mar-30 01:37 UTC

head link

[zfs-discuss] zfs diff

On Mon, Mar 29, 2010 at 06:38:47PM -0400, David Magda
wrote:> A new ARC case:
I read this earlier this morning. Welcome news indeed!

I have some concerns about the output format, having worked with
similar requirements in the past. In particular: as part of the
monotone VCS when reporting workspace changes and also as a consumer
of similar-purpose output from rsync when building backup catalog
databases of what changed each run.

I''m not familiar with the ARC process; where should these concerns be
directed so as to make a difference? [pun only slightly intended]

At the risk of prompting discussion here rather than the right
place.. These relate in several ways to the use of the name as the
only identifier.

For example, it''s not clear that the proposed output lets
me tell which new filenames are new links to which existing file, or
even whether added file names are new files or just new links.

There will also need to be clear rules on output ordering, with
respect to renames, where multiple changes have happened to renamed
files.

Some of these concerns might be better addressed with clearer examples
/ use cases. Consider the commmon case of a file having been replaced
with another: say, an editor that renames the old file and creates and
rewrites a new file with the same name. It may remove the old, or keep
it as a "backup" and maybe delete the previous backup. Would this be
reported as a series of renames and adds and deletes (tracking the
node), or merely as a content change (tracking the name)? Now consider
that the file may have had links.

I''m concerned that the proposed output format does not represent this
and similar cases well. I realise it''s not intended to convey all of
the details of what changed, merely to flag which files should be
checked for further information.

I also think distinguishing content-change from attribute-change
(e.g. chmod/chown) would be highly useful to potential consumers.

--
Dan.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 194 bytes
Desc: not available
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100330/04f1dc39/attachment.bin>

Daniel Carosone

2010-Mar-30 02:02 UTC

head link

[zfs-discuss] zfs diff

On Tue, Mar 30, 2010 at 12:37:15PM +1100, Daniel Carosone
wrote:> There will also need to be clear rules on output ordering, with
> respect to renames, where multiple changes have happened to renamed
> files. 
Separately, but relevant in particular to the above due to the
potential for races:  what is the defined behaviour when diffing
against a live filesystem (rather than a snapshot)? 

is there an implied snapshot (ie, diff based on content frozen at
txg_id when started) or is thhe comparison done against a moving
target? 

It''s not just a question of implementation if it can affect the
output, especially if it can make it internally inconsistent.

--
Dan.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 194 bytes
Desc: not available
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100330/0de885f4/attachment.bin>

Tim Haley

2010-Mar-30 03:05 UTC

head link

[zfs-discuss] zfs diff

On 3/29/10 8:02 PM, Daniel Carosone wrote:> On Tue, Mar 30, 2010 at 12:37:15PM +1100, Daniel Carosone wrote:
>
>> There will also need to be clear rules on output ordering, with
>> respect to renames, where multiple changes have happened to renamed
>> files.
>>
> Separately, but relevant in particular to the above due to the
> potential for races:  what is the defined behaviour when diffing
> against a live filesystem (rather than a snapshot)?
>
> is there an implied snapshot (ie, diff based on content frozen at
> txg_id when started) or is thhe comparison done against a moving
> target?
>
> It''s not just a question of implementation if it can affect the
> output, especially if it can make it internally inconsistent.
>
> --
> Dan.
>
Yes, a snapshot is taken and removed once the compare is performed.

-tim

Ian Collins

2010-Mar-30 04:57 UTC

head link

[zfs-discuss] zfs diff

On 03/30/10 12:44 PM, Mike Gerdts wrote:> On Mon, Mar 29, 2010 at 5:39 PM, Nicolas Williams
> <Nicolas.Williams at sun.com>  wrote:
>    
>> One really good use for zfs diff would be: as a way to index zfs send
>> backups by contents.
>>      
> Or to generate the list of files for incremental backups via NetBackup
> or similar.  This is especially important for file systems will
> millions of files with relatively few changes.
>    Or to generate the list of files for virus scanning!

-- 
Ian.

Edward Ned Harvey

2010-Mar-30 11:58 UTC

head link

[zfs-discuss] zfs diff

> On Mon, Mar 29, 2010 at 5:39 PM, Nicolas Williams
> <Nicolas.Williams at sun.com> wrote:
> > One really good use for zfs diff would be: as a way to index zfs send
> > backups by contents.
> 
> Or to generate the list of files for incremental backups via NetBackup
> or similar.  This is especially important for file systems will
> millions of files with relatively few changes.
+1

The reason "zfs send" is so fast, is not because it''s so
fast.  It''s because
it does not need any time to index and compare and analyze which files have
changed since the last snapshot or increment.

If the "zfs diff" command could generate the list of changed files,
and you
feed that into tar or whatever, then these 3rd party backup tools become
suddenly much more effective.  Able to rival the performance of "zfs
send."

zfs discuss - Mar 2010 - zfs diff

[zfs-discuss] zfs diff

[zfs-discuss] zfs diff

[zfs-discuss] zfs diff

[zfs-discuss] zfs diff

[zfs-discuss] zfs diff

[zfs-discuss] zfs diff

[zfs-discuss] zfs diff

[zfs-discuss] zfs diff

[zfs-discuss] zfs diff

[zfs-discuss] zfs diff

[zfs-discuss] zfs diff