thr3ads.net - Gluster users - [Gluster-users] gluster ha/replication/disaster recover(dr translator) wish list [Jan 2009]

If this information is useful, please help other people find it:
Share via:

Keith Freedman

2009-Jan-12 13:55 UTC

[Gluster-users] gluster ha/replication/disaster recover(dr translator) wish list

I just wanted to toss out a thought I had to get it on the table.

For me, the replication features (in any filesystem that supports it) 
serve several purposes

1, is to have 2 or more copies of the data which are live and useable 
(I think lustre doesn''t offer this) -- this is handy for HA, and for 
performance (in my case, the servers are clients, and so they read 
data from their local disk and only have to go down to network speed 
when writing).

another is for disaster recovery.

What I''d like to see is a DR translator..  which is basically 
identical to AFR with a few notable exceptions:
1) it would be a one-way pipe--when data is updated, the updates are 
pushed over, and it''s assumed that the DR location is never written 
to locally, so the auto-healing can make some assumptions and not 
have to do a 2 way comparison and data transfer
2) delayed writes -- I''d like to specify an allowable delay for 
updates (if this is 0, then my writes will block waiting on the data 
to be replicated), if this is higher, then gluster returns control 
back after it''s written the file to the "local brick" but
then
replicates in the background.
3) delayed writes 2 --  if we''re allowing delayed writes, then there 
may be an added benefit.  if the same file changes multiple times 
over a short period, we only have to transfer the most recent version 
of that data across the network.

So, one could have a disaster recovery site with slower Internet 
connections which are in sync within a specified amount of time.  Or 
one could even use a service like Amazon S3 as a repository without 
worrying about super huge data transfer fees.

I could see it used to manage a file-serving/web farm.  For example:
I might have 7 machines which just serve images and videos.  I update 
them by pushing a new image/video to one master server, the other 6 
get updated.

If someones updated a file on the DR box (i.e. the auto-heal would be 
triggered) instead of the file on the DR box being replicated back, 
it should be over-written with the version of the file on the master.

This would insure data integrity and you could put your master copy 
of files on a very hardened secure server behind a firewall or DMZ, 
and if someone breaks into a box and tries to overwrite an image or 
something it would automatically get ''healed'' from the master
copy.

Then, if there is a disaster, and you''re running from your DR site, 
you simply reverse the configuration, and after the disaster let 
things auto-heal the other direction and then switch back once things 
are in sync.

those are my thoughts.
Keith

Krishna Srinivas

2009-Jan-18 08:42 UTC

head link

[Gluster-users] gluster ha/replication/disaster recover(dr translator) wish list

Keith,

We had discussion about a translator with functionality similar to
what you have described. We termed it as "backup" translator. i.e a
translator which does delayed replication. This gives a better
response for the application. We can make a lot of assumptions like
the backup directory will not be written to when primary copy is up
etc. We have not given too much thought on this as of now. Definitely
it is in our minds too.

Regards
Krishna

On Mon, Jan 12, 2009 at 7:25 PM, Keith Freedman <freedman at
freeformit.com> wrote:> I just wanted to toss out a thought I had to get it on the table.
>
> For me, the replication features (in any filesystem that supports it)
> serve several purposes
>
> 1, is to have 2 or more copies of the data which are live and useable
> (I think lustre doesn't offer this) -- this is handy for HA, and for
> performance (in my case, the servers are clients, and so they read
> data from their local disk and only have to go down to network speed
> when writing).
>
> another is for disaster recovery.
>
> What I'd like to see is a DR translator..  which is basically
> identical to AFR with a few notable exceptions:
> 1) it would be a one-way pipe--when data is updated, the updates are
> pushed over, and it's assumed that the DR location is never written
> to locally, so the auto-healing can make some assumptions and not
> have to do a 2 way comparison and data transfer
> 2) delayed writes -- I'd like to specify an allowable delay for
> updates (if this is 0, then my writes will block waiting on the data
> to be replicated), if this is higher, then gluster returns control
> back after it's written the file to the "local brick" but
then
> replicates in the background.
> 3) delayed writes 2 --  if we're allowing delayed writes, then there
> may be an added benefit.  if the same file changes multiple times
> over a short period, we only have to transfer the most recent version
> of that data across the network.
>
> So, one could have a disaster recovery site with slower Internet
> connections which are in sync within a specified amount of time.  Or
> one could even use a service like Amazon S3 as a repository without
> worrying about super huge data transfer fees.
>
> I could see it used to manage a file-serving/web farm.  For example:
> I might have 7 machines which just serve images and videos.  I update
> them by pushing a new image/video to one master server, the other 6
> get updated.
>
> If someones updated a file on the DR box (i.e. the auto-heal would be
> triggered) instead of the file on the DR box being replicated back,
> it should be over-written with the version of the file on the master.
>
> This would insure data integrity and you could put your master copy
> of files on a very hardened secure server behind a firewall or DMZ,
> and if someone breaks into a box and tries to overwrite an image or
> something it would automatically get 'healed' from the master copy.
>
> Then, if there is a disaster, and you're running from your DR site,
> you simply reverse the configuration, and after the disaster let
> things auto-heal the other direction and then switch back once things
> are in sync.
>
> those are my thoughts.
> Keith
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users
>

Keith Freedman

2009-Jan-20 04:44 UTC

head link

[Gluster-users] gluster ha/replication/disaster recover(dr translator) wish list

At 12:42 AM 1/18/2009, Krishna Srinivas wrote:>Keith,
>
>We had discussion about a translator with functionality similar to
>what you have described. We termed it as "backup" translator. i.e
a
>translator which does delayed replication. This gives a better
>response for the application. We can make a lot of assumptions like
>the backup directory will not be written to when primary copy is up
>etc. We have not given too much thought on this as of now. Definitely
>it is in our minds too.
My guess is, it wouldn''t be a huge deviation from the HA translator 
as it will hopefully exist soon.

This would be wonderful for wide area replication over networks which 
aren''t always up.

or, just for doing nightly/weekly, etc.. backups on a schedule.

so, hopefully if it can make it into the roadmap, that''d be good, but 
I realize these things all take time and working on this would mean 
something else doesn''t get done.



>Regards
>Krishna
>
>On Mon, Jan 12, 2009 at 7:25 PM, Keith Freedman 
><freedman at freeformit.com> wrote:
> > I just wanted to toss out a thought I had to get it on the table.
> >
> > For me, the replication features (in any filesystem that supports it)
> > serve several purposes
> >
> > 1, is to have 2 or more copies of the data which are live and useable
> > (I think lustre doesn''t offer this) -- this is handy for HA,
and for
> > performance (in my case, the servers are clients, and so they read
> > data from their local disk and only have to go down to network speed
> > when writing).
> >
> > another is for disaster recovery.
> >
> > What I''d like to see is a DR translator..  which is basically
> > identical to AFR with a few notable exceptions:
> > 1) it would be a one-way pipe--when data is updated, the updates are
> > pushed over, and it''s assumed that the DR location is never
written
> > to locally, so the auto-healing can make some assumptions and not
> > have to do a 2 way comparison and data transfer
> > 2) delayed writes -- I''d like to specify an allowable delay
for
> > updates (if this is 0, then my writes will block waiting on the data
> > to be replicated), if this is higher, then gluster returns control
> > back after it''s written the file to the "local
brick" but then
> > replicates in the background.
> > 3) delayed writes 2 --  if we''re allowing delayed writes,
then there
> > may be an added benefit.  if the same file changes multiple times
> > over a short period, we only have to transfer the most recent version
> > of that data across the network.
> >
> > So, one could have a disaster recovery site with slower Internet
> > connections which are in sync within a specified amount of time.  Or
> > one could even use a service like Amazon S3 as a repository without
> > worrying about super huge data transfer fees.
> >
> > I could see it used to manage a file-serving/web farm.  For example:
> > I might have 7 machines which just serve images and videos.  I update
> > them by pushing a new image/video to one master server, the other 6
> > get updated.
> >
> > If someones updated a file on the DR box (i.e. the auto-heal would be
> > triggered) instead of the file on the DR box being replicated back,
> > it should be over-written with the version of the file on the master.
> >
> > This would insure data integrity and you could put your master copy
> > of files on a very hardened secure server behind a firewall or DMZ,
> > and if someone breaks into a box and tries to overwrite an image or
> > something it would automatically get ''healed'' from
the master copy.
> >
> > Then, if there is a disaster, and you''re running from your DR
site,
> > you simply reverse the configuration, and after the disaster let
> > things auto-heal the other direction and then switch back once things
> > are in sync.
> >
> > those are my thoughts.
> > Keith
> >
> >
> > _______________________________________________
> > Gluster-users mailing list
> > Gluster-users at gluster.org
> > http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users
> >

Prabhu Ramachandran

2009-Jan-25 17:12 UTC

head link

[Gluster-users] gluster ha/replication/disaster recover(dr translator) wish list

Keith Freedman wrote:> I just wanted to toss out a thought I had to get it on the table.
> 
> For me, the replication features (in any filesystem that supports it) 
> serve several purposes
> 
> 1, is to have 2 or more copies of the data which are live and useable 
> (I think lustre doesn't offer this) -- this is handy for HA, and for 
> performance (in my case, the servers are clients, and so they read 
> data from their local disk and only have to go down to network speed 
> when writing).
Just to second that I'd really appreciate this too.  I've been trying to
setup a couple of machines on a not-too-reliable 100Mbs network with a 
set of partitions each, mirror the same data.  Thus far I've been 
managing this with scripts that do rsync.

With glusterfs I used unify for the 4 partitions on each machine and 
then afr'd the two unified disks but was told that this is not a 
reliable way of doing things and that I'd run into problems when one 
host goes down.

What you are talking about seems to satisfy exactly what I am trying to 
do and would be very convenient.  If there is a way to do this currently 
with glusterfs I'd definitely like to hear about it.

cheers,
prabhu

Keith Freedman

2009-Jan-26 10:01 UTC

head link

[Gluster-users] gluster ha/replication/disaster recover(dr translator) wish list

At 09:12 AM 1/25/2009, Prabhu Ramachandran wrote:>Just to second that I''d really appreciate this too.  I''ve
been trying to
>setup a couple of machines on a not-too-reliable 100Mbs network with a
>set of partitions each, mirror the same data.  Thus far I''ve been
>managing this with scripts that do rsync.
>
>With glusterfs I used unify for the 4 partitions on each machine and
>then afr''d the two unified disks but was told that this is not a
>reliable way of doing things and that I''d run into problems when
one
>host goes down.
it depends what happens when a host goes down.
if the issue is "server crashed" then you should be fine doing this 
with gluster/HA translator.

as long as you unify bricks that are all on one server, and AFR to a 
unify of bricks all on the other server, then if one server is down, 
AFR will only use the unify brick of the server that is up.
when the other server comes back up, things will auto-heal the server 
that was down.

IF your problem is that a server is temporarily down because an 
unstable network connection, then you have a more difficult problem.

Here, if the network connection fails and is back up in short periods 
of time you''ll alway be experiencing delays as gluster is often 
waiting for timeouts, then the server is visible again, it 
auto-heals, then it''s not visible and it has to timeout.
It''ll likely work just fine, but this will seem pretty slow (but no 
moreso than an NFS mount behind a faulty connection I suppose).

Things can be further complicated if you have some clients that can 
see SERVER 1 and other clients that only see SERVER 2.  If this 
happens, then you will increase the likelihood of a split brain 
situation and things will go wrong when it tries to auto-heal (most 
likely requiring manual intervention to get back to a stable state).

so the replication features of gluster/HA will most likely solve your problem.
if you have specific concerns, post a volume config to the group so 
people can advise you on a specific configuration.

Keith

Gluster users - Jan 2009 - gluster ha/replication/disaster recover(dr translator) wish list

[Gluster-users] gluster ha/replication/disaster recover(dr translator) wish list

[Gluster-users] gluster ha/replication/disaster recover(dr translator) wish list

[Gluster-users] gluster ha/replication/disaster recover(dr translator) wish list

[Gluster-users] gluster ha/replication/disaster recover(dr translator) wish list

[Gluster-users] gluster ha/replication/disaster recover(dr translator) wish list