I always use raw images. And yes, sharding would also be good. On 03/23/17 12:36, Gandalf Corvotempesta wrote:> Georep expose to another problem: > When using gluster as storage for VM, the VM file is saved as qcow. > Changes are inside the qcow, thus rsync has to sync the whole file > every time > > A little workaround would be sharding, as rsync has to sync only the > changed shards, but I don't think this is a good solution > > Il 23 mar 2017 8:33 PM, "Joe Julian" <joe at julianfamily.org > <mailto:joe at julianfamily.org>> ha scritto: > > In many cases, a full backup set is just not feasible. Georep to > the same or different DC may be an option if the bandwidth can > keep up with the change set. If not, maybe breaking the data up > into smaller more manageable volumes where you only keep a smaller > set of critical data and just back that up. Perhaps an object > store (swift?) might handle fault tolerance distribution better > for some workloads. > > There's no one right answer. > > > On 03/23/17 12:23, Gandalf Corvotempesta wrote: >> Backing up from inside each VM doesn't solve the problem >> If you have to backup 500VMs you just need more than 1 day and >> what if you have to restore the whole gluster storage? >> >> How many days do you need to restore 1PB? >> >> Probably the only solution should be a georep in the same >> datacenter/rack with a similiar cluster, >> ready to became the master storage. >> In this case you don't need to restore anything as data are >> already there, >> only a little bit back in time but this double the TCO >> >> Il 23 mar 2017 6:39 PM, "Serkan ?oban" <cobanserkan at gmail.com >> <mailto:cobanserkan at gmail.com>> ha scritto: >> >> Assuming a backup window of 12 hours, you need to send data >> at 25GB/s >> to backup solution. >> Using 10G Ethernet on hosts you need at least 25 host to >> handle 25GB/s. >> You can create an EC gluster cluster that can handle this >> rates, or >> you just backup valuable data from inside VMs using open >> source backup >> tools like borg,attic,restic , etc... >> >> On Thu, Mar 23, 2017 at 7:48 PM, Gandalf Corvotempesta >> <gandalf.corvotempesta at gmail.com >> <mailto:gandalf.corvotempesta at gmail.com>> wrote: >> > Let's assume a 1PB storage full of VMs images with each >> brick over ZFS, >> > replica 3, sharding enabled >> > >> > How do you backup/restore that amount of data? >> > >> > Backing up daily is impossible, you'll never finish the >> backup that the >> > following one is starting (in other words, you need more >> than 24 hours) >> > >> > Restoring is even worse. You need more than 24 hours with >> the whole cluster >> > down >> > >> > You can't rely on ZFS snapshot due to sharding (the >> snapshot took from one >> > node is useless without all other node related at the same >> shard) and you >> > still have the same restore speed >> > >> > How do you backup this? >> > >> > Even georep isn't enough, if you have to restore the whole >> storage in case >> > of disaster >> > >> > _______________________________________________ >> > Gluster-users mailing list >> > Gluster-users at gluster.org <mailto:Gluster-users at gluster.org> >> > http://lists.gluster.org/mailman/listinfo/gluster-users >> <http://lists.gluster.org/mailman/listinfo/gluster-users> >> >> >> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org <mailto:Gluster-users at gluster.org> >> http://lists.gluster.org/mailman/listinfo/gluster-users >> <http://lists.gluster.org/mailman/listinfo/gluster-users> > _______________________________________________ Gluster-users > mailing list Gluster-users at gluster.org > <mailto:Gluster-users at gluster.org> > http://lists.gluster.org/mailman/listinfo/gluster-users > <http://lists.gluster.org/mailman/listinfo/gluster-users> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170323/e995f055/attachment.html>
Raw or qcow doesn't change anything about the backup. Georep always have to sync the whole file Additionally, raw images has much less features than qcow Il 23 mar 2017 8:40 PM, "Joe Julian" <joe at julianfamily.org> ha scritto:> I always use raw images. And yes, sharding would also be good. > > On 03/23/17 12:36, Gandalf Corvotempesta wrote: > > Georep expose to another problem: > When using gluster as storage for VM, the VM file is saved as qcow. > Changes are inside the qcow, thus rsync has to sync the whole file every > time > > A little workaround would be sharding, as rsync has to sync only the > changed shards, but I don't think this is a good solution > > Il 23 mar 2017 8:33 PM, "Joe Julian" <joe at julianfamily.org> ha scritto: > >> In many cases, a full backup set is just not feasible. Georep to the same >> or different DC may be an option if the bandwidth can keep up with the >> change set. If not, maybe breaking the data up into smaller more manageable >> volumes where you only keep a smaller set of critical data and just back >> that up. Perhaps an object store (swift?) might handle fault tolerance >> distribution better for some workloads. >> >> There's no one right answer. >> >> On 03/23/17 12:23, Gandalf Corvotempesta wrote: >> >> Backing up from inside each VM doesn't solve the problem >> If you have to backup 500VMs you just need more than 1 day and what if >> you have to restore the whole gluster storage? >> >> How many days do you need to restore 1PB? >> >> Probably the only solution should be a georep in the same datacenter/rack >> with a similiar cluster, >> ready to became the master storage. >> In this case you don't need to restore anything as data are already >> there, >> only a little bit back in time but this double the TCO >> >> Il 23 mar 2017 6:39 PM, "Serkan ?oban" <cobanserkan at gmail.com> ha >> scritto: >> >>> Assuming a backup window of 12 hours, you need to send data at 25GB/s >>> to backup solution. >>> Using 10G Ethernet on hosts you need at least 25 host to handle 25GB/s. >>> You can create an EC gluster cluster that can handle this rates, or >>> you just backup valuable data from inside VMs using open source backup >>> tools like borg,attic,restic , etc... >>> >>> On Thu, Mar 23, 2017 at 7:48 PM, Gandalf Corvotempesta >>> <gandalf.corvotempesta at gmail.com> wrote: >>> > Let's assume a 1PB storage full of VMs images with each brick over ZFS, >>> > replica 3, sharding enabled >>> > >>> > How do you backup/restore that amount of data? >>> > >>> > Backing up daily is impossible, you'll never finish the backup that the >>> > following one is starting (in other words, you need more than 24 hours) >>> > >>> > Restoring is even worse. You need more than 24 hours with the whole >>> cluster >>> > down >>> > >>> > You can't rely on ZFS snapshot due to sharding (the snapshot took from >>> one >>> > node is useless without all other node related at the same shard) and >>> you >>> > still have the same restore speed >>> > >>> > How do you backup this? >>> > >>> > Even georep isn't enough, if you have to restore the whole storage in >>> case >>> > of disaster >>> > >>> > _______________________________________________ >>> > Gluster-users mailing list >>> > Gluster-users at gluster.org >>> > http://lists.gluster.org/mailman/listinfo/gluster-users >>> >> >> >> _______________________________________________ >> Gluster-users mailing listGluster-users at gluster.orghttp://lists.gluster.org/mailman/listinfo/gluster-users >> >> _______________________________________________ Gluster-users mailing >> list Gluster-users at gluster.org http://lists.gluster.org/mailm >> an/listinfo/gluster-users > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170323/9cd2c5e6/attachment.html>