thr3ads.net - zfs discuss - [zfs-discuss] Injection of ZFS snapshots into existing data, and replacement of older snapshots with zfs recv without truncating newer ones [Jan 2012]

If this information is useful, please help other people find it:
Share via:

Jim Klimov

2012-Jan-13 01:00 UTC

[zfs-discuss] Injection of ZFS snapshots into existing data, and replacement of older snapshots with zfs recv without truncating newer ones

While reading about zfs on-disk formats, I wondered once again
why is it not possible to create a snapshot on existing data,
not of the current TXG but of some older point-in-time?

 From what I gathered, definition of a snapshot requires the
cut-off TXG number existence of some blocks in this dataset
with smaller-or-equal TXG numbers. It seems like just a
coincidence that current TXG is used and older TXGs aren''t.

Is it deemed inconvenient/unpractical/useless/didn''t think of,
or are there some fundamental or technological drawbacks to
the idea?

Note: this idea is related to my proposal in October thread
"[zfs-discuss] (Incremental) ZFS SEND at sub-snapshot level"
and could aid "restartable zfs-send" by creation of smaller
snapshots for incremental sending of existing large datasets.

Today I had a new twist on the idea, though: as I wrote
in other posts, my raidz2 did not help protecting some
of my data. One of the damaged files belongs to a stack
of snapshots that are continually replicated from another
box, and the inconsistent on-disk block is referenced in
an old snapshot (almost at the root of stack). Resending
and re-receiving the whole stack of snapshots is possible,
but inconvenient and slow. RSyncing just the difference
(good data instead of IO-Erroring byte range) to repair
the file would forfeit further increnmental snapshot syncs.

So I thought: it would be nice if it were possible (perhaps
not now, but in the future as an RFE) to resend and replace
just that snapshot in the middle or even root of the stack.
Perhaps even better, with ZDB or some other tools I might
determine which blocks have rottened and which TXG they
belonged to, and I''d "fence" that TXG on the source and
destination systems with proposed "injected snapshots".
Older and newer snapshots around this TXG range would
provide incremental changes to data, as they normally do,
and I''d only quickly replace a small intermittent snapshot.

All this needs is a couple of not-yet-existing features...

PS: I think that this idea might even have some "business
case" foundation for active-passive clusters with zfs send
updating a passive cluster node. Whenever scrub on one of
the systems finds an unrecoverable block in older data,
the node might request "just it" from the other head.
Likewise for backups to removable media, etc.
If we already have a ZFS-based storage similar to an
out-of-sync mirror, why not use the available knowledge
of known-good blocks to repair detected {small} errors
in large volumes of "same" data?

What do you think?..
//Jim Klimov

Jim Klimov

2012-Jan-13 05:23 UTC

head link

[zfs-discuss] Injection of ZFS snapshots into existing data, and replacement of older snapshots with zfs recv without truncating newer ones

2012-01-13 7:26, Steve Gonczi wrote:> JIm,
>
>
> Any modified block (in absence of a snaphot) gets re-written
> to a new location and the original block is freed.
>
> So the earlier state you want to go back and snapshot is no longer there,
>
> The essence of taking a snapshot is keeping the original blocks
> instead of freeing them.
Perhaps I need to specify some usecases more clearly:

1) Snapshot added in-between existing snapshots, or even
    before the first one currently existing, i.e. just to
    facilitate incremental snapshot sends in small chunks
    over lousy media (where zfs send is likely to never
    succeed for huge datasets sent as one initial stream).

2) Cloning and/or rollback of a dataset at some point in
    time (TXG number) of which I forgot to add a timely
    snapshot of. Apparently, this would only work to ignore
    added data, since overwritten blocks would be lost.

    Exception: there is a "last" chance to reference last
    32-128 TXGs, uberblocks for which still exist in the
    ring. Say, 128*5sec = 640 sec > 10.5 min of rollback
    info guaranteed to be not overwritten by ZFS COW.
    This would compensate most of those "Oh sh*t what
    have I done!?" moments of operator/admin errors,
    typos, etc. Injecting a snapshot into "3 minutes ago"
    would help retain that data not-actually-deleted
    from disk while you go about repairing damage ;)

    Perhaps this would even allow for undeletion of datasets
    which you never intended to destroy (notably, I had
    LU BE deletion trying to kill off my zone datasets
    some time around snv_101 or so; they were only saved
    by being mounted and running at the time).

3) Use along with that proposed replacement of existing
    snapshots (with degraded unreadable blocks) while
    maintaining the rest of snapshot/clone tree. If this
    "technology" were to be implemented, injected snaps
    could naturally be used to "fence off" the corrupted
    area (TXG number range) and replace the resulting
    smaller corrupt snapshot with good data from another
    storage.

    I hope it is not theoretically impossible to write
    this replacement snapshot in such a manner that the
    resulting sequence of block histories would still
    make sense as valid files. This block reallocation
    is not much different from autorepairs on resilver
    or scrub... I think :)

Thanks,
//Jim

Edward Ned Harvey

2012-Jan-13 13:24 UTC

head link

[zfs-discuss] Injection of ZFS snapshots into existing data, and replacement of older snapshots with zfs recv without truncating newer ones

> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss-
> bounces at opensolaris.org] On Behalf Of Jim Klimov
> 
> Perhaps I need to specify some usecases more clearly:
Actually, I''m not sure you do need to specify usecases more clearly -
Because the idea is obviously awesome.  The main problem, if you''re
interested, is getting attention.  Maybe it''s more work than I know,
but I
agree with you, at first blush it doesn''t sound like much work.  

I think the most compelling use case you mentioned was ability to resume
interrupted zfs send.

It''s one of those things where it''s not super-super useful
(most people are
content with whatever snapshot and zfs send scheme they already have today)
but if it''s not much work, then maybe it''s worth while anyway.

But there''s a finite amount of development resource.  And other
features
that are in higher demand (such as BP rewrite, etc).  Why would oracle or
nexenta care about devoting the effort?  Maybe it''s possible, maybe
there
just isn''t enough motivation...

Matthew Ahrens

2012-Jan-16 19:14 UTC

head link

[zfs-discuss] Injection of ZFS snapshots into existing data, and replacement of older snapshots with zfs recv without truncating newer ones

On Thu, Jan 12, 2012 at 5:00 PM, Jim Klimov <jimklimov at cos.ru> wrote:
> While reading about zfs on-disk formats, I wondered once again
> why is it not possible to create a snapshot on existing data,
> not of the current TXG but of some older point-in-time?
>
It is not possible because the older data may no longer exist on-disk.  For
example, you want to take a snapshot from 10 txg''s ago.  But since then
we
have created a new file, which modified the containing directory.  So we
freed the directory block from 10 txg''s ago.  That freed block is then
a
candidate for reallocation.

Existence of old uberblocks in the ring buffer does not indicate that the
data they reference is still valid.  This is the reason that "zpool import
-F" does not always work.

--matt
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20120116/da71e61c/attachment.html>

Jim Klimov

2012-Jan-16 19:34 UTC

head link

[zfs-discuss] Injection of ZFS snapshots into existing data, and replacement of older snapshots with zfs recv without truncating newer ones

2012-01-16 23:14, Matthew Ahrens ?????:> On Thu, Jan 12, 2012 at 5:00 PM, Jim Klimov <jimklimov at cos.ru
> <mailto:jimklimov at cos.ru>> wrote:
>
>     While reading about zfs on-disk formats, I wondered once again
>     why is it not possible to create a snapshot on existing data,
>     not of the current TXG but of some older point-in-time?
>
>
> It is not possible because the older data may no longer exist on-disk.
>   For example, you want to take a snapshot from 10 txg''s ago.  But
since
> then we have created a new file, which modified the containing
> directory.  So we freed the directory block from 10 txg''s ago. 
That
> freed block is then a candidate for reallocation.
>
> Existence of old uberblocks in the ring buffer does not indicate that
> the data they reference is still valid.  This is the reason that
"zpool
> import -F" does not always work.
Hmmm... the way I got it (but again have no prooflinks handy)
was that ZFS "recently" got a deferred-reuse feature to just
guarantee those rollbacks, basically. I am not sure which
builds or distros that might be included in.

If you authoritatively say it''s not there (or not in illumos),
I''m going to trust you ;)

What about injecting snapshots into static data - before at
least one existing snapshot? Is that possible? I do get your
point about missing older directory data and possible invalidity
of the snapshot as a ZPL dataset (and probably a bad basis for
a writeable clone)... but let''s call them checkpoints then, and
limit use for zfs send and fencing of erred ranges ;)

Is that technically possible or logically reasonable?

Thanks,
//Jim

Matthew Ahrens

2012-Jan-16 20:39 UTC

head link

[zfs-discuss] Injection of ZFS snapshots into existing data, and replacement of older snapshots with zfs recv without truncating newer ones

On Mon, Jan 16, 2012 at 11:34 AM, Jim Klimov <jimklimov at cos.ru> wrote:
> 2012-01-16 23:14, Matthew Ahrens ?????:
>
>> On Thu, Jan 12, 2012 at 5:00 PM, Jim Klimov <jimklimov at cos.ru
>> <mailto:jimklimov at cos.ru>> wrote:
>>
>>    While reading about zfs on-disk formats, I wondered once again
>>    why is it not possible to create a snapshot on existing data,
>>    not of the current TXG but of some older point-in-time?
>>
>>
>> It is not possible because the older data may no longer exist on-disk.
>>  For example, you want to take a snapshot from 10 txg''s ago. 
But since
>> then we have created a new file, which modified the containing
>> directory.  So we freed the directory block from 10 txg''s ago.
That
>> freed block is then a candidate for reallocation.
>>
>> Existence of old uberblocks in the ring buffer does not indicate that
>> the data they reference is still valid.  This is the reason that
"zpool
>> import -F" does not always work.
>>
>
> Hmmm... the way I got it (but again have no prooflinks handy)
> was that ZFS "recently" got a deferred-reuse feature to just
> guarantee those rollbacks, basically. I am not sure which
> builds or distros that might be included in.
>
> If you authoritatively say it''s not there (or not in illumos),
> I''m going to trust you ;)
>
It''s definitely not there in Illumos.  See TXG_DEFER_SIZE.  There was
talk
of changing it at Oracle, don''t know if that ever happened.  If you
have a
S11 system you could probably use mdb to look at the size of the
ms_defermap.

--matt
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20120116/1bd7544f/attachment.html>

zfs discuss - Jan 2012 - Injection of ZFS snapshots into existing data, and replacement of older snapshots with zfs recv without truncating newer ones

[zfs-discuss] Injection of ZFS snapshots into existing data, and replacement of older snapshots with zfs recv without truncating newer ones

[zfs-discuss] Injection of ZFS snapshots into existing data, and replacement of older snapshots with zfs recv without truncating newer ones

[zfs-discuss] Injection of ZFS snapshots into existing data, and replacement of older snapshots with zfs recv without truncating newer ones

[zfs-discuss] Injection of ZFS snapshots into existing data, and replacement of older snapshots with zfs recv without truncating newer ones

[zfs-discuss] Injection of ZFS snapshots into existing data, and replacement of older snapshots with zfs recv without truncating newer ones

[zfs-discuss] Injection of ZFS snapshots into existing data, and replacement of older snapshots with zfs recv without truncating newer ones