thr3ads.net - CentOS - [CentOS] KVM vs. incremental remote backups [Mar 2021]

If this information is useful, please help other people find it:
Share via:

Nicolas Kovacs

2021-Mar-31 12:41 UTC

[CentOS] KVM vs. incremental remote backups

Hi,

Up until recently I've hosted all my stuff (web & mail) on a handful of
bare
metal servers. Web applications (WordPress, OwnCloud, Dolibarr, GEPI,
Roundcube) as well as mail and a few other things were hosted mostly on one big
machine.

Backups for this setup were done using Rsnapshot, a nifty utility that combines
Rsync over SSH and hard links to make incremental backups.

This approach has become problematic, for several reasons. First, web
applications have increasingly specific and sometimes mutually exclusive
requirements. And second, last month I had a server crash, and even though I
had backups for everything, this meant quite some offline time.

So I've opted to go for KVM-based solutions, with everything split up over a
series of KVM guests. I wrapped my head around KVM, played around with it (a
lot) and now I'm more or less ready to go.

One detail is nagging me though: backups.

Let's say I have one VM that handles only DNS (base installation + BIND) and
one other VM that handles mail (base installation + Postfix + Dovecot).

Under the hood that's two QCOW2 images stored in /var/lib/libvirt/images.

With the old "bare metal" approach I could perform remote backups
using Rsync,
so only the difference between two backups would get transferred over the
network. Now with KVM images it looks like every day I have to transfer the
whole image again. As soon as some images have lots of data on them (say, 100
GB for a small OwnCloud server), this quickly becomes unmanageable.

I googled around quite some time for "KVM backup best practices" and
was a bit
puzzled to find many folks asking the same question and no real answer, at
least not without having to jump through burning loops.

Any suggestions ?

Niki

-- 
Microlinux - Solutions informatiques durables
7, place de l'?glise - 30730 Montpezat
Site : https://www.microlinux.fr
Blog : https://blog.microlinux.fr
Mail : info at microlinux.fr
T?l. : 04 66 63 10 32
Mob. : 06 51 80 12 12

Stephen John Smoogen

2021-Mar-31 13:10 UTC

head link

[CentOS] KVM vs. incremental remote backups

On Wed, 31 Mar 2021 at 08:41, Nicolas Kovacs <info at microlinux.fr>
wrote:
> Hi,
>
> Up until recently I've hosted all my stuff (web & mail) on a
handful of
> bare
> metal servers. Web applications (WordPress, OwnCloud, Dolibarr, GEPI,
> Roundcube) as well as mail and a few other things were hosted mostly on
> one big
> machine.
>
> Backups for this setup were done using Rsnapshot, a nifty utility that
> combines
> Rsync over SSH and hard links to make incremental backups.
>
> This approach has become problematic, for several reasons. First, web
> applications have increasingly specific and sometimes mutually exclusive
> requirements. And second, last month I had a server crash, and even though
> I
> had backups for everything, this meant quite some offline time.
>
> So I've opted to go for KVM-based solutions, with everything split up
over
> a
> series of KVM guests. I wrapped my head around KVM, played around with it
> (a
> lot) and now I'm more or less ready to go.
>
> One detail is nagging me though: backups.
>
> Let's say I have one VM that handles only DNS (base installation +
BIND)
> and
> one other VM that handles mail (base installation + Postfix + Dovecot).
>
> Under the hood that's two QCOW2 images stored in
/var/lib/libvirt/images.
>
> With the old "bare metal" approach I could perform remote backups
using
> Rsync,
> so only the difference between two backups would get transferred over the
> network. Now with KVM images it looks like every day I have to transfer the
> whole image again. As soon as some images have lots of data on them (say,
> 100
> GB for a small OwnCloud server), this quickly becomes unmanageable.
>
> I googled around quite some time for "KVM backup best practices"
and was a
> bit
> puzzled to find many folks asking the same question and no real answer, at
> least not without having to jump through burning loops.
>
> Any suggestions ?
>
>For Fedora Infrastructure we use a three prong approach
1. Kickstarts for the basic system
2. Ansible for the deployment and 'general configuration management'
3. rdiff-backup of things which ansible would not be able to bring back.

So most of our infrastructure is KVM only and the only systems we have to
kickstart by 'hand' are the bare metal. The guests are then fired off
with
an ansible playbook which uses libvirt to fire up the initial guest and
kickstart from known data. Then the playbook continues and builds out the
system for the rest of the deployment. [Our guests are also usually lvm
partitions so we can use LVM tools to snapshot the system in different
ways.]

After it is done there are usually scripts which do things like do ascii
dumps of databases and such.

As you pointed out this isn't the only way to do so. Other sites have a
master qemu image for all their guests on a machine and clone that instead
of doing kickstarts for each. They also do snapshots of the images via lvm
or some other tool in order to make backups that way.

hope this helps.


> Niki
>
> --
> Microlinux - Solutions informatiques durables
> 7, place de l'?glise - 30730 Montpezat
> Site : https://www.microlinux.fr
> Blog : https://blog.microlinux.fr
> Mail : info at microlinux.fr
> T?l. : 04 66 63 10 32
> Mob. : 06 51 80 12 12
> _______________________________________________
> CentOS mailing list
> CentOS at centos.org
> https://lists.centos.org/mailman/listinfo/centos
>

-- 
Stephen J Smoogen.

Robert Heller

2021-Mar-31 13:19 UTC

head link

[CentOS] KVM vs. incremental remote backups

What *I* do for backing up KVM VMs is that I use LVM volumes, not QCOW2 
images.  Then I take a LVM "snapshot" volume, then mount that locally
/
readonly on the host and use tar (via Amanda).  Another option is to install 
Amanda's client on the VM itself and use Amanda to use tar (running on the
VM)
-- I use the latter to deal with VMs that have a FS that it not mountable on 
the host (usually due to ext4 version issues -- CentOS 6's mount.ext4 did
not
like Ubuntu's 18.04 ext4 fs).  I have always found using container image
files
with VMs a bit too opaque.

Since you are using QCOW2 images, you best option would be to treat the VMs 
as if they were just bare metal servers and rsync over the virtual network 
(ala 'rsync -a vmhostname:/
backupserver:/backupdisk/vmhostname_backup/') and
not even try to backup the QCOW2 image files, except maybe once in awhile for 
"disaster" recovery purposes (eg if you need  to recreate th VM from
scratch
from a known state).



At Wed, 31 Mar 2021 14:41:09 +0200 CentOS mailing list <centos at
centos.org> wrote:
> 
> Hi,
> 
> Up until recently I've hosted all my stuff (web & mail) on a
handful of bare
> metal servers. Web applications (WordPress, OwnCloud, Dolibarr, GEPI,
> Roundcube) as well as mail and a few other things were hosted mostly on one
big
> machine.
> 
> Backups for this setup were done using Rsnapshot, a nifty utility that
combines
> Rsync over SSH and hard links to make incremental backups.
> 
> This approach has become problematic, for several reasons. First, web
> applications have increasingly specific and sometimes mutually exclusive
> requirements. And second, last month I had a server crash, and even though
I
> had backups for everything, this meant quite some offline time.
> 
> So I've opted to go for KVM-based solutions, with everything split up
over a
> series of KVM guests. I wrapped my head around KVM, played around with it
(a
> lot) and now I'm more or less ready to go.
> 
> One detail is nagging me though: backups.
> 
> Let's say I have one VM that handles only DNS (base installation +
BIND) and
> one other VM that handles mail (base installation + Postfix + Dovecot).
> 
> Under the hood that's two QCOW2 images stored in
/var/lib/libvirt/images.
> 
> With the old "bare metal" approach I could perform remote backups
using Rsync,
> so only the difference between two backups would get transferred over the
> network. Now with KVM images it looks like every day I have to transfer the
> whole image again. As soon as some images have lots of data on them (say,
100
> GB for a small OwnCloud server), this quickly becomes unmanageable.
> 
> I googled around quite some time for "KVM backup best practices"
and was a bit
> puzzled to find many folks asking the same question and no real answer, at
> least not without having to jump through burning loops.
> 
> Any suggestions ?
> 
> Niki
> 
-- 
Robert Heller             -- Cell: 413-658-7953 GV: 978-633-5364
Deepwoods Software        -- Custom Software Services
http://www.deepsoft.com/  -- Linux Administration Services
heller at deepsoft.com       -- Webhosting Services

Leon Fauster

2021-Mar-31 13:58 UTC

head link

[CentOS] KVM vs. incremental remote backups

On 31.03.21 14:41, Nicolas Kovacs wrote:> Hi,
> 
> Up until recently I've hosted all my stuff (web & mail) on a
handful of bare
> metal servers. Web applications (WordPress, OwnCloud, Dolibarr, GEPI,
> Roundcube) as well as mail and a few other things were hosted mostly on one
big
> machine.
> 
> Backups for this setup were done using Rsnapshot, a nifty utility that
combines
> Rsync over SSH and hard links to make incremental backups.
> 
> This approach has become problematic, for several reasons. First, web
> applications have increasingly specific and sometimes mutually exclusive
> requirements. And second, last month I had a server crash, and even though
I
> had backups for everything, this meant quite some offline time.
> 
> So I've opted to go for KVM-based solutions, with everything split up
over a
> series of KVM guests. I wrapped my head around KVM, played around with it
(a
> lot) and now I'm more or less ready to go.
> 
> One detail is nagging me though: backups.
> 
> Let's say I have one VM that handles only DNS (base installation +
BIND) and
> one other VM that handles mail (base installation + Postfix + Dovecot).
> 
> Under the hood that's two QCOW2 images stored in
/var/lib/libvirt/images.
> 
> With the old "bare metal" approach I could perform remote backups
using Rsync,
> so only the difference between two backups would get transferred over the
> network. Now with KVM images it looks like every day I have to transfer the
> whole image again. As soon as some images have lots of data on them (say,
100
> GB for a small OwnCloud server), this quickly becomes unmanageable.
> 
> I googled around quite some time for "KVM backup best practices"
and was a bit
> puzzled to find many folks asking the same question and no real answer, at
> least not without having to jump through burning loops.
> 
> Any suggestions ?
>
As others pointed out - LVM would be a smart solution and BTW rsnapshot 
supports LVM snapshot backups.

If you want a raw approach against the image file, then use a 
deduplication backup tool (block based backups).

--
Leon

Simon Matter

2021-Mar-31 14:58 UTC

head link

[CentOS] KVM vs. incremental remote backups

> Hi,
>
> Up until recently I've hosted all my stuff (web & mail) on a
handful of
> bare
> metal servers. Web applications (WordPress, OwnCloud, Dolibarr, GEPI,
> Roundcube) as well as mail and a few other things were hosted mostly on
> one big
> machine.
>
> Backups for this setup were done using Rsnapshot, a nifty utility that
> combines
> Rsync over SSH and hard links to make incremental backups.
>
> This approach has become problematic, for several reasons. First, web
> applications have increasingly specific and sometimes mutually exclusive
> requirements. And second, last month I had a server crash, and even though
> I
> had backups for everything, this meant quite some offline time.
>
> So I've opted to go for KVM-based solutions, with everything split up
over
> a
> series of KVM guests. I wrapped my head around KVM, played around with it
> (a
> lot) and now I'm more or less ready to go.
>
> One detail is nagging me though: backups.
>
> Let's say I have one VM that handles only DNS (base installation +
BIND)
> and
> one other VM that handles mail (base installation + Postfix + Dovecot).
>
> Under the hood that's two QCOW2 images stored in
/var/lib/libvirt/images.
>
> With the old "bare metal" approach I could perform remote backups
using
> Rsync,
We're doing rsnapshot based backups for everything, VMs and bare metal
systems. We don't care about KVM image files for backups.

When a new host is included in the backup, we first do a hard link based
copy on the backup server of another, similar server. Then, the most of
the OS is already there on the backup server and real backup consumes only
little space.

The only problem we had with rsnapshot is that rsync by default can't
handle a lot of hard links. We're now using our own build of rsync 3.2.3
with --max-alloc=0 and multi million hard links are not a problem anymore.

Regards,
Simon

Gionatan Danti

2021-Mar-31 19:35 UTC

head link

[CentOS] KVM vs. incremental remote backups

Il 2021-03-31 14:41 Nicolas Kovacs ha scritto:> Hi,
> 
> Up until recently I've hosted all my stuff (web & mail) on a
handful of
> bare
> metal servers. Web applications (WordPress, OwnCloud, Dolibarr, GEPI,
> Roundcube) as well as mail and a few other things were hosted mostly on 
> one big
> machine.
> 
> Backups for this setup were done using Rsnapshot, a nifty utility that 
> combines
> Rsync over SSH and hard links to make incremental backups.
> 
> This approach has become problematic, for several reasons. First, web
> applications have increasingly specific and sometimes mutually 
> exclusive
> requirements. And second, last month I had a server crash, and even 
> though I
> had backups for everything, this meant quite some offline time.
> 
> So I've opted to go for KVM-based solutions, with everything split up 
> over a
> series of KVM guests. I wrapped my head around KVM, played around with 
> it (a
> lot) and now I'm more or less ready to go.
> 
> One detail is nagging me though: backups.
> 
> Let's say I have one VM that handles only DNS (base installation + 
> BIND) and
> one other VM that handles mail (base installation + Postfix + Dovecot).
> 
> Under the hood that's two QCOW2 images stored in 
> /var/lib/libvirt/images.
> 
> With the old "bare metal" approach I could perform remote backups
using
> Rsync,
> so only the difference between two backups would get transferred over 
> the
> network. Now with KVM images it looks like every day I have to transfer 
> the
> whole image again. As soon as some images have lots of data on them 
> (say, 100
> GB for a small OwnCloud server), this quickly becomes unmanageable.
> 
> I googled around quite some time for "KVM backup best practices"
and
> was a bit
> puzzled to find many folks asking the same question and no real answer, 
> at
> least not without having to jump through burning loops.
> 
> Any suggestions ?
> 
> Niki
Hi Nicolas,
the simpler approach would be to use a filesystem which natively 
supports send/recv on another host.

You can be tempted to use btrfs, but having tested it I strongly advice 
against it: it will horribly fragments and performance will be bad even 
if disabling CoW (which, by the way, is automatically re-enabled by 
snapshots).

I currently just use ZFS on Linux and it works very well. However, using 
it in CentOS is not trouble-free and it has its own CLI and specific 
issues to be aware; so, I understand if you don't want to go down this 
rabbit hole.

The next best thing I can suggest is to use lvmthin and XFS, with 
efficient block-level copies done to another host via tools as bdsync 
[1] or blocksync [2] (of which I forked an advanced version). On the 
receiving host, you should (again) use lvmthin and XFS with periodic 
snapshots.

Finally, I would leave the current rsnapshot backups in-place: you will 
simply copy from a virtual machine rather than from a bare metal host. I 
found rsnapshot really useful and reliable, so I suggest to continue 
using it even if efficient block-level backup are taken.

Just my 2 cents.
Regards.

-- 
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.danti at assyoma.it - info at assyoma.it
GPG public key ID: FF5F32A8

Peter Eckel

2021-Apr-01 09:59 UTC

head link

[CentOS] KVM vs. incremental remote backups

Hi Niki,

I'm using a similar approach like Stephen's, but with a kink.

* Kickstart all machines from a couple of ISOs, depending on the requirements
(the Kickstart process is controlled by Ansible)
* Machines that have persistent data (which make up about 50% in average) have
at least two virtual disk devices: The one for the OS (which gets overwritten by
Kickstart when a machine is re-created), and another one for persistent data
(which Kickstart doesn't touch)
* Ansible sets up everything on the base server Kickstart provides, starting
from basic OS hardening, authentication and ending with monitoring and backup of
the data volume
* Backup is done via Bareos to a redundant storage server

That way I can reinitialise a VM at any time without having to care for the
persistent data in most cases. If persistent data need to be restored as well,
Bareos can handle that as soon as the machine has been set up via Ansible. OS
files are never backed up at all.

An improvement I'm planning to look into is moving from Kickstart to
Terraform for the provisioning of the base machines. Currently it takes me about
10 minutes to recreate a broken VM provided the persistent data is left intact.

Cheers, 

  Peter.
> On 31. Mar 2021, at 14:41, Nicolas Kovacs <info at microlinux.fr>
wrote:
> 
> Hi,
> 
> Up until recently I've hosted all my stuff (web & mail) on a
handful of bare
> metal servers. Web applications (WordPress, OwnCloud, Dolibarr, GEPI,
> Roundcube) as well as mail and a few other things were hosted mostly on one
big
> machine.
> 
> Backups for this setup were done using Rsnapshot, a nifty utility that
combines
> Rsync over SSH and hard links to make incremental backups.
> 
> This approach has become problematic, for several reasons. First, web
> applications have increasingly specific and sometimes mutually exclusive
> requirements. And second, last month I had a server crash, and even though
I
> had backups for everything, this meant quite some offline time.
> 
> So I've opted to go for KVM-based solutions, with everything split up
over a
> series of KVM guests. I wrapped my head around KVM, played around with it
(a
> lot) and now I'm more or less ready to go.
> 
> One detail is nagging me though: backups.
> 
> Let's say I have one VM that handles only DNS (base installation +
BIND) and
> one other VM that handles mail (base installation + Postfix + Dovecot).
> 
> Under the hood that's two QCOW2 images stored in
/var/lib/libvirt/images.
> 
> With the old "bare metal" approach I could perform remote backups
using Rsync,
> so only the difference between two backups would get transferred over the
> network. Now with KVM images it looks like every day I have to transfer the
> whole image again. As soon as some images have lots of data on them (say,
100
> GB for a small OwnCloud server), this quickly becomes unmanageable.
> 
> I googled around quite some time for "KVM backup best practices"
and was a bit
> puzzled to find many folks asking the same question and no real answer, at
> least not without having to jump through burning loops.
> 
> Any suggestions ?
> 
> Niki
> 
> -- 
> Microlinux - Solutions informatiques durables
> 7, place de l'?glise - 30730 Montpezat
> Site : https://www.microlinux.fr
> Blog : https://blog.microlinux.fr
> Mail : info at microlinux.fr
> T?l. : 04 66 63 10 32
> Mob. : 06 51 80 12 12
> _______________________________________________
> CentOS mailing list
> CentOS at centos.org
> https://lists.centos.org/mailman/listinfo/centos

CentOS - Mar 2021 - KVM vs. incremental remote backups

[CentOS] KVM vs. incremental remote backups

[CentOS] KVM vs. incremental remote backups

[CentOS] KVM vs. incremental remote backups

[CentOS] KVM vs. incremental remote backups

[CentOS] KVM vs. incremental remote backups

[CentOS] KVM vs. incremental remote backups

[CentOS] KVM vs. incremental remote backups