thr3ads.net - Gluster users - [Gluster-users] Backups [Mar 2017]

If this information is useful, please help other people find it:
Share via:

Gandalf Corvotempesta

2017-Mar-23 19:36 UTC

[Gluster-users] Backups

Georep expose to another problem:
When using gluster as storage for VM, the VM file is saved as qcow. Changes
are inside the qcow, thus rsync has to sync the whole file every time

A little workaround would be sharding, as rsync has to sync only the
changed shards, but I don't think this is a good solution

Il 23 mar 2017 8:33 PM, "Joe Julian" <joe at julianfamily.org>
ha scritto:
> In many cases, a full backup set is just not feasible. Georep to the same
> or different DC may be an option if the bandwidth can keep up with the
> change set. If not, maybe breaking the data up into smaller more manageable
> volumes where you only keep a smaller set of critical data and just back
> that up. Perhaps an object store (swift?) might handle fault tolerance
> distribution better for some workloads.
>
> There's no one right answer.
>
> On 03/23/17 12:23, Gandalf Corvotempesta wrote:
>
> Backing up from inside each VM doesn't solve the problem
> If you have to backup 500VMs you just need more than 1 day and what if you
> have to restore the whole gluster storage?
>
> How many days do you need to restore 1PB?
>
> Probably the only solution should be a georep in the same datacenter/rack
> with a similiar cluster,
> ready to became the master storage.
> In this case you don't need to restore anything as data are already
there,
> only a little bit back in time but this double the TCO
>
> Il 23 mar 2017 6:39 PM, "Serkan ?oban" <cobanserkan at
gmail.com> ha scritto:
>
>> Assuming a backup window of 12 hours, you need to send data at 25GB/s
>> to backup solution.
>> Using 10G Ethernet on hosts you need at least 25 host to handle 25GB/s.
>> You can create an EC gluster cluster that can handle this rates, or
>> you just backup valuable data from inside VMs using open source backup
>> tools like borg,attic,restic , etc...
>>
>> On Thu, Mar 23, 2017 at 7:48 PM, Gandalf Corvotempesta
>> <gandalf.corvotempesta at gmail.com> wrote:
>> > Let's assume a 1PB storage full of VMs images with each brick
over ZFS,
>> > replica 3, sharding enabled
>> >
>> > How do you backup/restore that amount of data?
>> >
>> > Backing up daily is impossible, you'll never finish the backup
that the
>> > following one is starting (in other words, you need more than 24
hours)
>> >
>> > Restoring is even worse. You need more than 24 hours with the
whole
>> cluster
>> > down
>> >
>> > You can't rely on ZFS snapshot due to sharding (the snapshot
took from
>> one
>> > node is useless without all other node related at the same shard)
and
>> you
>> > still have the same restore speed
>> >
>> > How do you backup this?
>> >
>> > Even georep isn't enough, if you have to restore the whole
storage in
>> case
>> > of disaster
>> >
>> > _______________________________________________
>> > Gluster-users mailing list
>> > Gluster-users at gluster.org
>> > http://lists.gluster.org/mailman/listinfo/gluster-users
>>
>
>
> _______________________________________________
> Gluster-users mailing listGluster-users at
gluster.orghttp://lists.gluster.org/mailman/listinfo/gluster-users
>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20170323/892371f6/attachment.html>

Gandalf Corvotempesta

2017-Mar-23 19:38 UTC

head link

[Gluster-users] Backups

Maybe exposing the volume as iscsi and then using zfs over iscsi on each
hypervisor?
In this case I'll be able to use zfs snapshot and send them to the backup
server

Il 23 mar 2017 8:36 PM, "Gandalf Corvotempesta" <
gandalf.corvotempesta at gmail.com> ha scritto:
> Georep expose to another problem:
> When using gluster as storage for VM, the VM file is saved as qcow.
> Changes are inside the qcow, thus rsync has to sync the whole file every
> time
>
> A little workaround would be sharding, as rsync has to sync only the
> changed shards, but I don't think this is a good solution
>
> Il 23 mar 2017 8:33 PM, "Joe Julian" <joe at
julianfamily.org> ha scritto:
>
>> In many cases, a full backup set is just not feasible. Georep to the
same
>> or different DC may be an option if the bandwidth can keep up with the
>> change set. If not, maybe breaking the data up into smaller more
manageable
>> volumes where you only keep a smaller set of critical data and just
back
>> that up. Perhaps an object store (swift?) might handle fault tolerance
>> distribution better for some workloads.
>>
>> There's no one right answer.
>>
>> On 03/23/17 12:23, Gandalf Corvotempesta wrote:
>>
>> Backing up from inside each VM doesn't solve the problem
>> If you have to backup 500VMs you just need more than 1 day and what if
>> you have to restore the whole gluster storage?
>>
>> How many days do you need to restore 1PB?
>>
>> Probably the only solution should be a georep in the same
datacenter/rack
>> with a similiar cluster,
>> ready to became the master storage.
>> In this case you don't need to restore anything as data are already
>> there,
>> only a little bit back in time but this double the TCO
>>
>> Il 23 mar 2017 6:39 PM, "Serkan ?oban" <cobanserkan at
gmail.com> ha
>> scritto:
>>
>>> Assuming a backup window of 12 hours, you need to send data at
25GB/s
>>> to backup solution.
>>> Using 10G Ethernet on hosts you need at least 25 host to handle
25GB/s.
>>> You can create an EC gluster cluster that can handle this rates, or
>>> you just backup valuable data from inside VMs using open source
backup
>>> tools like borg,attic,restic , etc...
>>>
>>> On Thu, Mar 23, 2017 at 7:48 PM, Gandalf Corvotempesta
>>> <gandalf.corvotempesta at gmail.com> wrote:
>>> > Let's assume a 1PB storage full of VMs images with each
brick over ZFS,
>>> > replica 3, sharding enabled
>>> >
>>> > How do you backup/restore that amount of data?
>>> >
>>> > Backing up daily is impossible, you'll never finish the
backup that the
>>> > following one is starting (in other words, you need more than
24 hours)
>>> >
>>> > Restoring is even worse. You need more than 24 hours with the
whole
>>> cluster
>>> > down
>>> >
>>> > You can't rely on ZFS snapshot due to sharding (the
snapshot took from
>>> one
>>> > node is useless without all other node related at the same
shard) and
>>> you
>>> > still have the same restore speed
>>> >
>>> > How do you backup this?
>>> >
>>> > Even georep isn't enough, if you have to restore the whole
storage in
>>> case
>>> > of disaster
>>> >
>>> > _______________________________________________
>>> > Gluster-users mailing list
>>> > Gluster-users at gluster.org
>>> > http://lists.gluster.org/mailman/listinfo/gluster-users
>>>
>>
>>
>> _______________________________________________
>> Gluster-users mailing listGluster-users at
gluster.orghttp://lists.gluster.org/mailman/listinfo/gluster-users
>>
>>
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20170323/b22854ba/attachment.html>

Joe Julian

2017-Mar-23 19:40 UTC

head link

[Gluster-users] Backups

I always use raw images. And yes, sharding would also be good.


On 03/23/17 12:36, Gandalf Corvotempesta wrote:> Georep expose to another problem:
> When using gluster as storage for VM, the VM file is saved as qcow. 
> Changes are inside the qcow, thus rsync has to sync the whole file 
> every time
>
> A little workaround would be sharding, as rsync has to sync only the 
> changed shards, but I don't think this is a good solution
>
> Il 23 mar 2017 8:33 PM, "Joe Julian" <joe at julianfamily.org 
> <mailto:joe at julianfamily.org>> ha scritto:
>
>     In many cases, a full backup set is just not feasible. Georep to
>     the same or different DC may be an option if the bandwidth can
>     keep up with the change set. If not, maybe breaking the data up
>     into smaller more manageable volumes where you only keep a smaller
>     set of critical data and just back that up. Perhaps an object
>     store (swift?) might handle fault tolerance distribution better
>     for some workloads.
>
>     There's no one right answer.
>
>
>     On 03/23/17 12:23, Gandalf Corvotempesta wrote:
>>     Backing up from inside each VM doesn't solve the problem
>>     If you have to backup 500VMs you just need more than 1 day and
>>     what if you have to restore the whole gluster storage?
>>
>>     How many days do you need to restore 1PB?
>>
>>     Probably the only solution should be a georep in the same
>>     datacenter/rack with a similiar cluster,
>>     ready to became the master storage.
>>     In this case you don't need to restore anything as data are
>>     already there,
>>     only a little bit back in time but this double the TCO
>>
>>     Il 23 mar 2017 6:39 PM, "Serkan ?oban" <cobanserkan at
gmail.com
>>     <mailto:cobanserkan at gmail.com>> ha scritto:
>>
>>         Assuming a backup window of 12 hours, you need to send data
>>         at 25GB/s
>>         to backup solution.
>>         Using 10G Ethernet on hosts you need at least 25 host to
>>         handle 25GB/s.
>>         You can create an EC gluster cluster that can handle this
>>         rates, or
>>         you just backup valuable data from inside VMs using open
>>         source backup
>>         tools like borg,attic,restic , etc...
>>
>>         On Thu, Mar 23, 2017 at 7:48 PM, Gandalf Corvotempesta
>>         <gandalf.corvotempesta at gmail.com
>>         <mailto:gandalf.corvotempesta at gmail.com>> wrote:
>>         > Let's assume a 1PB storage full of VMs images with
each
>>         brick over ZFS,
>>         > replica 3, sharding enabled
>>         >
>>         > How do you backup/restore that amount of data?
>>         >
>>         > Backing up daily is impossible, you'll never finish
the
>>         backup that the
>>         > following one is starting (in other words, you need more
>>         than 24 hours)
>>         >
>>         > Restoring is even worse. You need more than 24 hours with
>>         the whole cluster
>>         > down
>>         >
>>         > You can't rely on ZFS snapshot due to sharding (the
>>         snapshot took from one
>>         > node is useless without all other node related at the same
>>         shard) and you
>>         > still have the same restore speed
>>         >
>>         > How do you backup this?
>>         >
>>         > Even georep isn't enough, if you have to restore the
whole
>>         storage in case
>>         > of disaster
>>         >
>>         > _______________________________________________
>>         > Gluster-users mailing list
>>         > Gluster-users at gluster.org <mailto:Gluster-users at
gluster.org>
>>         > http://lists.gluster.org/mailman/listinfo/gluster-users
>>         <http://lists.gluster.org/mailman/listinfo/gluster-users>
>>
>>
>>
>>     _______________________________________________
>>     Gluster-users mailing list
>>     Gluster-users at gluster.org <mailto:Gluster-users at
gluster.org>
>>     http://lists.gluster.org/mailman/listinfo/gluster-users
>>     <http://lists.gluster.org/mailman/listinfo/gluster-users>
>     _______________________________________________ Gluster-users
>     mailing list Gluster-users at gluster.org
>     <mailto:Gluster-users at gluster.org>
>     http://lists.gluster.org/mailman/listinfo/gluster-users
>     <http://lists.gluster.org/mailman/listinfo/gluster-users> 
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20170323/e995f055/attachment.html>

Gluster users - Mar 2017 - Backups

[Gluster-users] Backups

[Gluster-users] Backups

[Gluster-users] Backups