thr3ads.net - zfs discuss - [zfs-discuss] Remote Mirror? [May 2006]

If this information is useful, please help other people find it:
Share via:

Patrick

2006-May-08 12:27 UTC

[zfs-discuss] Remote Mirror?

Hi there,

I''ve got a question, that i''m sure''s been addressed
somewhere, so
sorry if i''m asking the same question twice, but here goes:

I''ve currently got two linux machines running drbd ( remote device
mirror ) and it''s working perfectly, but i''d love to use ZFS
(* i
<heart> ZFS *) but alas, i don''t seem to see any information on
remote
mirroring other than a blog i''ve found about using NFS to export the
device, the page :

http://blogs.sun.com/roller/page/chrisg?entry=zfs_remote_replication
http://blogs.sun.com/roller/page/chrisg?entry=more_with_the_zfs_external

The page has an idea that seems somewhat fiddly, and i''d rather not
trust it on a production-type enviroment, anyone have any more
''info''
for me?

P

--
Patrick
----------------------------------------
patrick <at> eefy <dot> net

Richard Elling

2006-May-08 18:00 UTC

head link

[zfs-discuss] Remote Mirror?

On Mon, 2006-05-08 at 14:27 +0200, Patrick wrote:> Hi there,
Hello.
> I''ve got a question, that i''m sure''s been
addressed somewhere, so
> sorry if i''m asking the same question twice, but here goes:
> 
> I''ve currently got two linux machines running drbd ( remote device
> mirror ) and it''s working perfectly, but i''d love to use
ZFS (* i
> <heart> ZFS *) but alas, i don''t seem to see any information
on remote
> mirroring other than a blog i''ve found about using NFS to export
the
> device, the page :
> 
> http://blogs.sun.com/roller/page/chrisg?entry=zfs_remote_replication
> http://blogs.sun.com/roller/page/chrisg?entry=more_with_the_zfs_external
> 
> The page has an idea that seems somewhat fiddly, and i''d rather
not
> trust it on a production-type enviroment, anyone have any more
''info''
> for me?
Well, this can be a pretty deep topic for a monday morning :-)
In my experience, the approach and solution for "remote mirroring"
really depends on two things:
	1. are you doing disaster recovery, versus mirroring diversity?
	2. how far apart are the mirrored devices?

Without answering those questions first, you will risk a suboptimal
solution.
 -- richard

Patrick

2006-May-08 22:49 UTC

head link

[zfs-discuss] Remote Mirror?

> Hello.
Howdy! :)
> Well, this can be a pretty deep topic for a monday morning :-)
Well, monday evening for some of us ;)
> In my experience, the approach and solution for "remote
mirroring"
> really depends on two things:
>         1. are you doing disaster recovery, versus mirroring diversity?
I''m not actually sure, i''m currently mirroring from the one
disk
device to the other over the network to cater for hardware failures,
and ''''software'''' failures ( such as a kernel
panic and such ) the idea
was to have an ''offline'' machine that would have a full copy
of the
data. Although how usefull that would be in the real world is still to
be decided, however currently i''ve got it clipped into a few other
bits like Heartbeat and such, so it''ll do a full failover and move,
but i''m probably going to remove that due to the ''extra layer
of
complexity creating more complex problems''

so I suppose that''d put me into the ''disaster
recovery'' class.
>         2. how far apart are the mirrored devices?
about 15cm-20cm ( via crossover on seperate interfaces, not network
osmosis ) ( v20z''s btw. )
> Without answering those questions first, you will risk a suboptimal
> solution.
Anything i missed?

Patrick

Eric Schrock

2006-May-08 23:22 UTC

head link

[zfs-discuss] Remote Mirror?

On Mon, May 08, 2006 at 02:27:05PM +0200, Patrick wrote:> Hi there,
> 
> I''ve got a question, that i''m sure''s been
addressed somewhere, so
> sorry if i''m asking the same question twice, but here goes:
> 
> I''ve currently got two linux machines running drbd ( remote device
> mirror ) and it''s working perfectly, but i''d love to use
ZFS (* i
> <heart> ZFS *) but alas, i don''t seem to see any information
on remote
> mirroring other than a blog i''ve found about using NFS to export
the
> device, the page :
> 
> http://blogs.sun.com/roller/page/chrisg?entry=zfs_remote_replication
> http://blogs.sun.com/roller/page/chrisg?entry=more_with_the_zfs_external
> 
> The page has an idea that seems somewhat fiddly, and i''d rather
not
> trust it on a production-type enviroment, anyone have any more
''info''
> for me?
There are two basic methods of doing remote replication: remote
mirroring and asynchronous updates.  

Remote mirroring can be done through iSCSI/zvols, though the failover
case is a little awkward (you''ll be doing ZFS on top of zvols for local
access).  It also implies synchronous operation for every write, which
would slow down local access.  The remote data is also not available
(even read-only) until you perform a failover, because mirroring occurs
at the block level and the upper layers cannot keep in sync.

Asynchronous remote replication can be done today with ''zfs
send'' and
zfs receive'', though it needs some more work to be truly useful.  It
has
the properties that it doesn''t tax local activity, but your data will
be
slightly out of sync (depending on how often you sync your data,
preferably a few minutes).  Among the things we''re working on to make
this easier:

- recursive snapshots (''zfs snapshot -r'')
- recursive send (''zfs send -r'')
- ability to send properties (''zfs send -p'')
- read-only receive on the remote end (currently the fs has to be
  unmounted)
- ability to receive into root dataset

Once these are are implemented, it should be fairly easy to construct
your own cron job to do regular remote replication using any transport
you''d like (probably SSH).  The tricky part comes when local churn
outpaces regular replication.  Do you want to guarantee your time delta
(by slowing down local access somehow), or hope that the remote end will
catch up (disabling more attempts in the process).

- Eric

--
Eric Schrock, Solaris Kernel Development       http://blogs.sun.com/eschrock

Richard Elling

2006-May-09 01:29 UTC

head link

[zfs-discuss] Remote Mirror?

On Tue, 2006-05-09 at 00:49 +0200, Patrick wrote:> > In my experience, the approach and solution for "remote
mirroring"
> > really depends on two things:
> >         1. are you doing disaster recovery, versus mirroring
diversity?
> 
> I''m not actually sure, i''m currently mirroring from the
one disk
> device to the other over the network to cater for hardware failures,
> and ''''software'''' failures ( such as a
kernel panic and such ) the idea
> was to have an ''offline'' machine that would have a full
copy of the
> data. Although how usefull that would be in the real world is still to
> be decided, however currently i''ve got it clipped into a few other
> bits like Heartbeat and such, so it''ll do a full failover and
move,
> but i''m probably going to remove that due to the ''extra
layer of
> complexity creating more complex problems''
> 
> so I suppose that''d put me into the ''disaster
recovery'' class.
No.  You are describing a more common mirroring diversity setup.
> >         2. how far apart are the mirrored devices?
> 
> about 15cm-20cm ( via crossover on seperate interfaces, not network
> osmosis ) ( v20z''s btw. )
Definitely not disaster recovery... if a tornado hit one box, it
would likely also hit the other.  Also, in disaster recovery
scenarios we often consider a required time delay between committing
data on the primary versus the secondary, in order to protect
against accidental data loss (eg. rm *)

There are a couple of ways to do this, some of which aren''t quite
ready for release and aren''t part of the OpenSolaris tree, yet.

The Sun StorageTek Availability Suite software provides block-device
level replication and should work out of the box with ZFS (I dunno
for sure, but since they operate at different levels in the stack,
they should interoperate ok).  
http://www.sun.com/storagetek/management_software/data_protection/availability/

For a real cluster solution, which it sounds like you don''t want,
Sun Cluster will have failover file system support for ZFS in an
upcoming release, pretty much like currently exists with other file
systems (UFS, QFS, VxFS).

It seems to me that drbd is a compromise between the previous two.
I would say that they are doing the easy parts, but putting off the
hard parts.

For a more point-in-time snapshot solution, you could use zfs 
send/receive.

There has also been a discussion on the requirements for a multi-node
ZFS implementation.  The window for comments may still be open.  See
the ZFS discuss forum archive for the threads.
 -- richard

Darren Reed

2006-May-09 20:33 UTC

head link

[zfs-discuss] Remote Mirror?

Eric Schrock wrote:
>...
>Asynchronous remote replication can be done today with ''zfs
send'' and
>zfs receive'', though it needs some more work to be truly useful. 
It has
>the properties that it doesn''t tax local activity, but your data
will be
>slightly out of sync (depending on how often you sync your data,
>preferably a few minutes
>
Is it possible to add "tail -f" like properties to ''zfs
send''?

I suppose what I''m thinking of for ''zfs send -f''
would be to send
down all of the transactions that update a ZFS data set, both the
metadata and the data.

The catch here would be to start the ''zfs send -f'' at the same
time
as the filesystem came online so that there weren''t any transactional
gaps.

Thoughts?

Darren

Matthew Ahrens

2006-May-09 20:53 UTC

head link

[zfs-discuss] Remote Mirror?

On Tue, May 09, 2006 at 01:33:33PM -0700, Darren Reed
wrote:> Eric Schrock wrote:
> 
> >...
> >Asynchronous remote replication can be done today with ''zfs
send'' and
> >zfs receive'', though it needs some more work to be truly
useful.  It has
> >the properties that it doesn''t tax local activity, but your
data will be
> >slightly out of sync (depending on how often you sync your data,
> >preferably a few minutes
> >
> 
> Is it possible to add "tail -f" like properties to ''zfs
send''?
> 
> I suppose what I''m thinking of for ''zfs send -f''
would be to send
> down all of the transactions that update a ZFS data set, both the
> metadata and the data.
''zfs send'' always sends all the changes, including metadata
and data.
> The catch here would be to start the ''zfs send -f'' at the
same time
> as the filesystem came online so that there weren''t any
transactional
> gaps.
You can always simply run ''zfs snapshot; zfs send -i ... | ssh
...'' in a
loop.  This is an implementation of best-effort remote replication.

Perhaps you''re looking for a more real-time remote replication.  See:

	5036182 want remote replication (intent-log based)

Another possible remote replication implementation would allow the
administrator to put a bound on how much the remote side can be out of
date (eg. by an amount of time, or amount of modified data).  This could
be implemented by using ''zfs send -i'', with some hooks to
stall changes
to the filesystem if the ''zfs send -i'' gets too far behind.

--matt

Nicolas Williams

2006-May-09 21:01 UTC

head link

[zfs-discuss] Remote Mirror?

On Tue, May 09, 2006 at 01:33:33PM -0700, Darren Reed
wrote:> Eric Schrock wrote:
> 
> >...
> >Asynchronous remote replication can be done today with ''zfs
send'' and
> >zfs receive'', though it needs some more work to be truly
useful.  It has
> >the properties that it doesn''t tax local activity, but your
data will be
> >slightly out of sync (depending on how often you sync your data,
> >preferably a few minutes
> >
> 
> Is it possible to add "tail -f" like properties to ''zfs
send''?
> 
> I suppose what I''m thinking of for ''zfs send -f''
would be to send
> down all of the transactions that update a ZFS data set, both the
> metadata and the data.
> 
> The catch here would be to start the ''zfs send -f'' at the
same time
> as the filesystem came online so that there weren''t any
transactional
> gaps.
> 
> Thoughts?
+1

Add to this some churn/replication throttling and you may not want just
a command-line interface but a library also.

E.g., if the stdout/remote connection of zfs send -f blocked for
long/broke then zfs should snapshot at the latest TXG and hold on to
that snapshot until the output could drain and/or connection be
restored, then resume by sending the incremental from the current TXG to
that snapshot...

Nico
--

Darren Reed

2006-May-09 21:09 UTC

head link

[zfs-discuss] Remote Mirror?

Matthew Ahrens wrote:
>On Tue, May 09, 2006 at 01:33:33PM -0700, Darren Reed wrote:
>  
>
>>Eric Schrock wrote:
>>
>>    
>>
>>>...
>>>Asynchronous remote replication can be done today with ''zfs
send'' and
>>>zfs receive'', though it needs some more work to be truly
useful.  It has
>>>the properties that it doesn''t tax local activity, but your
data will be
>>>slightly out of sync (depending on how often you sync your data,
>>>preferably a few minutes
>>>
>>>      
>>>
>>Is it possible to add "tail -f" like properties to
''zfs send''?
>>
>>I suppose what I''m thinking of for ''zfs send
-f'' would be to send
>>down all of the transactions that update a ZFS data set, both the
>>metadata and the data.
>>    
>>
>
>''zfs send'' always sends all the changes, including
metadata and data.
>
>  
>
>>The catch here would be to start the ''zfs send -f'' at
the same time
>>as the filesystem came online so that there weren''t any
transactional
>>gaps.
>>    
>>
>
>You can always simply run ''zfs snapshot; zfs send -i ... | ssh
...'' in a
>loop.  This is an implementation of best-effort remote replication.
>
>Perhaps you''re looking for a more real-time remote replication. 
See:
>
>	5036182 want remote replication (intent-log based)
>  
>
Yes, I think this is what I was thinking of.
If I could add a vote to it via opensolaris.org, I would :)

Would it be possible to specify more than 1 remote destination
for a single replication?

Or could you chain together replication, so that I have:

hosta# zfs send -f | rsh hostb zfs receive -f
hostb# zfs send -f | rsh hostc zfs receive -f

The second would feed of the intent-log updates from the input
of the first, allowing for cascaded remote replication.
>Another possible remote replication implementation would allow the
>administrator to put a bound on how much the remote side can be out of
>date (eg. by an amount of time, or amount of modified data).  This could
>be implemented by using ''zfs send -i'', with some hooks to
stall changes
>to the filesystem if the ''zfs send -i'' gets too far
behind.
>  
>
That''s a great idea too :)

Darren

Al Hopper

2006-May-09 21:56 UTC

head link

[zfs-discuss] Remote Mirror?

On Tue, 9 May 2006, Nicolas Williams wrote:
> On Tue, May 09, 2006 at 01:33:33PM -0700, Darren Reed wrote:
> > Eric Schrock wrote:
> >
> > >...
> > >Asynchronous remote replication can be done today with
''zfs send'' and
> > >zfs receive'', though it needs some more work to be truly
useful.  It has
> > >the properties that it doesn''t tax local activity, but
your data will be
> > >slightly out of sync (depending on how often you sync your data,
> > >preferably a few minutes
> > >
> >
> > Is it possible to add "tail -f" like properties to
''zfs send''?
> >
> > I suppose what I''m thinking of for ''zfs send
-f'' would be to send
> > down all of the transactions that update a ZFS data set, both the
> > metadata and the data.
> >
> > The catch here would be to start the ''zfs send -f''
at the same time
> > as the filesystem came online so that there weren''t any
transactional
> > gaps.
> >
> > Thoughts?
>
> +1
>
> Add to this some churn/replication throttling and you may not want just
> a command-line interface but a library also.
>
> E.g., if the stdout/remote connection of zfs send -f blocked for
> long/broke then zfs should snapshot at the latest TXG and hold on to
> that snapshot until the output could drain and/or connection be
> restored, then resume by sending the incremental from the current TXG to
> that snapshot...
While I agree that zfs send is incredibly useful, after reading this post
I''m asking myself:

a) This already sounds like we''re descending the slippery slope of
''checkpointing'' - which is an incredibly hard problem to solve
and
involves considerable hardware/software resources to achieve.  The only
successfull implementation (arguably) that does checkpointing, that I know
about, is the Burroughs B7700 stack-based mainframe - where every process
is a stack and checkpointing consisted of taking a snapshot of the stack
that represents the processes and moving it to other (mirror) hardware.
And much of this is implemented in hardware to solve the excessively high
"costs" of such operations.

b) You can never sucessfully checkpoint an application via data
replication.  Why?  Because, at some point you''re trying to take a
snapshot of a process (or related processes) that modifies multiple files
that represent inter-related data.  That is what we have relational
databases for and the concept of:

begin_transaction
do blah op a
do blah op b
do baah op c
end_transaction

If anything goes wrong with operation a, b or c, you want to backout the
entire transaction.  If remote data replication could be implemented
successfully, you would not need begin_transaction ... end_transaction
semantics or (to spend the $s on) an RDBMS.

Or stated in different terms: if remote replication resolved the issue of
maintaining application state, then one could simply replicate the
underlying files that represented an Oracle or mySQL database and
you''re
done with application/site failover.  Buzzzzz ... loser.  Not possible.

The real issue is where do you draw the line?
And how do you manage user expectations if the user is convinced that by
mirroring the active filesystem, they have achieved site
diversity/failover?

Al Hopper  Logical Approach Inc, Plano, TX.  al at logical-approach.com
           Voice: 972.379.2133 Fax: 972.379.2134  Timezone: US CDT
OpenSolaris.Org Community Advisory Board (CAB) Member - Apr 2005

Nicolas Williams

2006-May-09 22:05 UTC

head link

[zfs-discuss] Remote Mirror?

On Tue, May 09, 2006 at 04:56:05PM -0500, Al Hopper
wrote:> While I agree that zfs send is incredibly useful, after reading this post
> I''m asking myself:
> 
> a) This already sounds like we''re descending the slippery slope of
> ''checkpointing'' - which is an incredibly hard problem to
solve and[...]

This is replication, not checkpointing.
> b) You can never sucessfully checkpoint an application via data
> replication.  Why?  Because, at some point you''re trying to take a
> snapshot of a process (or related processes) that modifies multiple files
> that represent inter-related data.  That is what we have relational
> databases for and the concept of:
But if zfs send/receive can also send snapshots, that is, if I create a
snapshot on a filesystem under replication the same snapshot should show
up at the replica, then we place the checkpointing burden on the
application/administrator.
> The real issue is where do you draw the line?
I think live replication could be incredibly useful, lags and all,
because not everything you might replicate involves complex state, 
much that does supports journalling/rollback anyways, and you can do
your own snapshotting to deal with checkpointing, in which case live
replication merely spreads the cost of replication around, instead of
making it bursty (but may be less efficient since some of the churn will
be redundant).
> And how do you manage user expectations if the user is convinced that by
> mirroring the active filesystem, they have achieved site
> diversity/failover?
How is remote replication in this regard different from local mirroring?

Nico
--

zfs discuss - May 2006 - Remote Mirror?

[zfs-discuss] Remote Mirror?

[zfs-discuss] Remote Mirror?

[zfs-discuss] Remote Mirror?

[zfs-discuss] Remote Mirror?

[zfs-discuss] Remote Mirror?

[zfs-discuss] Remote Mirror?

[zfs-discuss] Remote Mirror?

[zfs-discuss] Remote Mirror?

[zfs-discuss] Remote Mirror?

[zfs-discuss] Remote Mirror?

[zfs-discuss] Remote Mirror?