thr3ads.net - zfs discuss - [zfs-discuss] (Incremental) ZFS SEND at sub-snapshot level [Oct 2011]

If this information is useful, please help other people find it:
Share via:

Jim Klimov

2011-Oct-29 17:57 UTC

[zfs-discuss] (Incremental) ZFS SEND at sub-snapshot level

Hello all,

   I am catching up with some 500 posts that I skipped this
summer, and came up with a new question. In short, is it
possible to add "restartability" to ZFS SEND, for example
by adding artificial snapshots (of configurable increment
size) into already existing datasets [too large to be
zfs-sent successfully as one chunk of stream data]?

   I''ll start by pre-history of this question, and continue
with the detailed idea below:

   On one hand, there was a post about a T2000 system kernel
panicking while trying to import a pool. It was probable that
the pool was receiving a large (3Tb) zfs send stream, and
this receiving was aborted due to some external issues.
Afterwards the pool apparently got into a cycle of trying
to destroy the received part of the stream during a pool
import attempt, exhausted all RAM and hanged the server.

   From my experience reported this spring to the forums (alas,
which are now gone - and the forums-to-mail replication did
not work at that time) and to the Illumos bugtracker, I hope
that the OP''s pool did get imported after a few weeks of
power cycles. I had different conditions (destroying some
snapshots and datasets on a deduped pool) with similar effect.

   On another hand, there was a discussion (actually, lots
of them) about "rsync vs. zfs send".

   My new question couples these threads.

   I know that it has been discussed for a number of times
that ZFS SEND is more efficient at finding differences and
sending updates that a filesystem crawl and calculating
checksums all over again. However, RSYNC has an important
benefit of being restartable.

   As shown by the first post I mentioned, broken ZFS SEND
operation can lead to long downtimes. With sufficiently
large increments (i.e. initial stream of a large dataset),
low bandwidth''es and high probability of network errors
or power glitches, it may be even guaranteed to never
transfer that much data as to complete a single ZFS SEND
operation; for example, when replicating 3Tb over a
few-Kbps subscriber-level internet link which is reset
every 24 hours for ISP''s traffic accounting reasons.

   On the opposite, it is easy to construct an rsync loop
which would transfer all files after several weeks of
hard work.

   But that would not be a ZFS-snapshot replica, so further
updates can not be made via ZFS SEND either - locking the
user into rsync loops forever.

   Now, I wondered if it is possible to embed snapshots
(or some similar construct) into existing data, for the
purpose of tab-keeping during zfs send and zfs recv?

   For example, the same existing 3Tb dataset could be
artificially pre-represented as an horde of snapshots
each utilizing 1Gb of disk space, with valid ZFS
incremental sends over whatever network link we have.
However unlike zfs-auto-snap, these snapshots would not
really appear on-disk while the dataset was being
written (historically). Instead, they would be patched-on
by the admins after the factual data appeared on disk,
before the ZFS SEND.

   Alternatively, if the ZFS SEND is detected to have
been broken, the sending side might set a "tab" on the
offset where it was last reading the sent data. The
receiver (upon pool import or whatever other recovery)
also would set such a tab, instead of destroying the
broken snapshot (which may take weeks and lots of
downtime as proved by several reports on the list,
including mine) and restarting from scratch - likely
doomed to be broken as well.

   In terms of code this would probably be like the
normal "zfs snapshot" mixed with the reverse of
"zfs destroy @snapshot", meaning that some existing
blocks would be reassigned as "owned" by a newly
embedded snapshot instead of being "owned" by the
live dataset or some more recent snapshot...

//Jim

Edward Ned Harvey

2011-Oct-29 22:14 UTC

head link

[zfs-discuss] (Incremental) ZFS SEND at sub-snapshot level

> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss-
> bounces at opensolaris.org] On Behalf Of Jim Klimov
> 
> summer, and came up with a new question. In short, is it
> possible to add "restartability" to ZFS SEND, for example
Rather than building something new and special into the filesystem, would
something like a restartable/continuable mbuffer command do the trick?  It
seems to be a general issue, not filesystem specific - that you want to
tunnel some command or some data stream through a buffering (perhaps even
checksumming/error detecting/correcting) buffering system, to make it more
resilient crossing a WAN or whatever.

There is probably already a utility like that.  I quickly checked mbuffer to
see if it did, but it didn''t seem to do that.  I didn''t look
very deeply, I
could be wrong.

Jim Klimov

2011-Oct-30 19:11 UTC

head link

[zfs-discuss] (Incremental) ZFS SEND at sub-snapshot level

2011-10-30 2:14, Edward Ned Harvey ?????:>> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss-
>> bounces at opensolaris.org] On Behalf Of Jim Klimov
>>
>> summer, and came up with a new question. In short, is it
>> possible to add "restartability" to ZFS SEND, for example
>
> Rather than building something new and special into the filesystem, would
> something like a restartable/continuable mbuffer command do the trick?
Well, it is true that for the purposes of sending a replication
stream over a flaky network, some sort of restartable buffer
program might suffice.

If one or both machines were rebooted in the process, however,
this would get us into the situation that all incomplete-snapshot
data was sent in vain, and the receiver has to destroy that data,
which may even get it to crash during pool import. Afterwards
the send attempt has to be done again, and if the conditions
were such that any attempt is likely to fail - it likely will.
Not all of our machines live in ivory-tower datacenters ;)

Per Paul Kraus (who recently wrote about similar problems):
 > Uhhh, not being able to destroy snapshots that are "too big"
 > is a pretty big one for us

Inserting artificial snapshots into existing datasets (perhaps
including the inheritance tree of "huge incomplete snapshots"
such as we can see now) might also allow to destroy an unneeded
dataset with less strain on the system, piece by piece. Perhaps
even without causing a loop of kernel panics, wow! ;)

The way I see it, this feature would help solve at least two
problems (or work-around them). To me these problems are
substantial. Perhaps to others, like Paul, too.

Because of highly-probable failures during a single unit of
ZFS-SEND replication, I am bound to not use it at all.
I also have to plan destruction of datasets at my home rig
(which was tainted with dedup) and expect weeks of downtime
while the system is being reset to crawl through the blocks
being released after a large delete...

//Jim

Jim Klimov

2011-Oct-30 19:14 UTC

head link

[zfs-discuss] (Incremental) ZFS SEND at sub-snapshot level

2011-10-29 21:57, Jim Klimov ?????:> ... In short, is it
> possible to add "restartability" to ZFS SEND, for example
> by adding artificial snapshots (of configurable increment
> size) into already existing datasets [too large to be
> zfs-sent successfully as one chunk of stream data]?


On a side note: would this feature, like any other nice-to-have
feature in ZFS, require The Mythical Block Pointer Rewrite (TM)?

For no apparent reason yet, I''m already afraid so ;)

If this is the Holy Grail which everybody craves and nobody saw,
what is really the problem of making it happen? Some time ago
I skimmed through an overview of "what would have to be done
for it". Not being a hardcore ZFS programmer I did not grasp
what is so fundamentally difficult about the quest. So I still
wonder if it is impossible, or if anyone is already working
on it quietly? ;)

//Jim

Paul Kraus

2011-Oct-31 14:38 UTC

head link

[zfs-discuss] (Incremental) ZFS SEND at sub-snapshot level

On Sat, Oct 29, 2011 at 1:57 PM, Jim Klimov <jimklimov at cos.ru> wrote:
> ?I am catching up with some 500 posts that I skipped this
> summer, and came up with a new question. In short, is it
> possible to add "restartability" to ZFS SEND, for example
> by adding artificial snapshots (of configurable increment
> size) into already existing datasets [too large to be
> zfs-sent successfully as one chunk of stream data]?
     We addressed this by decreasing our snapshot interval from 1 day
to 1 hour. We rarely have a snapshot bigger than a few GB now. I keep
meaning to put together a snapshot script that takes a new snapshot
when the amount of changed data increases to a certain point (for
example, take a snapshot whenever the snapshot would contain 250 MB of
data). Not enough round toits with all the other broken stuff to fix
:-(

-- 
{--------1---------2---------3---------4---------5---------6---------7---------}
Paul Kraus
-> Senior Systems Architect, Garnet River ( http://www.garnetriver.com/ )
-> Sound Coordinator, Schenectady Light Opera Company (
http://www.sloctheater.org/ )
-> Technical Advisor, RPI Players

Matthew Ahrens

2011-Nov-05 01:22 UTC

head link

[zfs-discuss] (Incremental) ZFS SEND at sub-snapshot level

On Sat, Oct 29, 2011 at 10:57 AM, Jim Klimov <jimklimov at cos.ru> wrote:
> In short, is it
> possible to add "restartability" to ZFS SEND

In short, yes.

We are working on it here at Delphix, and plan to contribute our changes
upstream to Illumos.

You can read more about it in the slides I link to in this blog post:

http://blog.delphix.com/matt/2011/11/01/zfs-10-year-anniversary/

--matt
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20111104/8fdaabd9/attachment.html>

zfs discuss - Oct 2011 - (Incremental) ZFS SEND at sub-snapshot level

[zfs-discuss] (Incremental) ZFS SEND at sub-snapshot level

[zfs-discuss] (Incremental) ZFS SEND at sub-snapshot level

[zfs-discuss] (Incremental) ZFS SEND at sub-snapshot level

[zfs-discuss] (Incremental) ZFS SEND at sub-snapshot level

[zfs-discuss] (Incremental) ZFS SEND at sub-snapshot level

[zfs-discuss] (Incremental) ZFS SEND at sub-snapshot level