Uwe Schuerkamp <uwe.schuerkamp at nionex.net> 2009-03-13
10:42:> Hi folks,
>
> I was wondering what is a good backup strategy for ocfs2 based
> clusters.
>
>
> Background: We're running a cluster of 8 SLES 10 sp2 machines sharing
> a common SAN-based FS (/shared) which is about 350g in size at the
> moment. We've already taken care of the usual optimizations concerning
> mount options on the cluster nodes (noatime and so on), but our backup
> software (bacula 2.2.8) slows to a crawl when encountering directories
> in this filesystem that contain quite a few small files. Data rates
> usually average in the tens of MB/sec doing "normal" backups of
local
> filesystems on remote machines in the same LAN, but with the ocfs2 fs
> bacula is hard pressed to not fall below 1mb / sec sustained
> throughput which obviously isn't enough to back up 350g of data in a
> sensible timeframe.
>
> I've already tried disabling compression, rsync'ing to another
server
> and so on, but so far nothing has helped with improving data rates.
>
> How would reducing the number of cluster nodes help with backups? Is
> there a "dirty read" option in ocfs2 that would allow reading the
> files without locking them first or something similar? I don't think
> bacula is the culprit as it easily manages larger backups in the same
> environment, even reading off smb shares is order of magnitudes faster
> in this case, so my guess is I'm missing out some non-obvious
> optimization that would improve ocfs2 cluster performance.
>
> Thanks in advance for any pointers & all the best,
>
>
> Uwe
This clearly may not work for all cases and I'm sure is totally
unsupported, but our SAN (Equallogic) has the ability to take RW
snapshots which is where we do our backups from. There was a thread a
while back about the proper way to do this. Basically after taking the
snapshot you need to fixup the filesystem in a couple of different ways
(fsck, relabel, reuuid, etc.) so that the machine can mount several of
these at once. If anyone's interested I can post these scripts. Since
there's only one machine handling the snapshots and it's outside of the
real ocfs2 cluster, while we're doing the fixups we also convert the
snapshot to a local fs and finally remount it ro. This prevents all
network locking from happening (since it's unnecessary) while the
backups happen. We're doing this with a 2TB mail volume (~700G of
_many_ small files) and haven't noticed any problems with it.
I think you could probably achieve something similar by taking the
number of active nodes in the cluster down to 1 during your backup
window, but that has it's own problems to be concerned with. I think a
simple umount /shared on all but that one would do it.
Brian