thr3ads.net - rsync - rsync, --sparse and VM disk images [Oct 2009]

If this information is useful, please help other people find it:
Share via:

Chris Dew

2009-Oct-09 11:48 UTC

rsync, --sparse and VM disk images

Hi Bas,

I'm not sure if this is of interest, but I also had issues with VM
disk-image sparse-files (in my case KVM, rather than VMWare), which
I've now resolved.

http://www.finalcog.com/rsync-vm-sparse-inplace-kvm-vmware

All the best,

Chris Dew.


P.S. Apologies for any breach of etiquette - I could not see Bas'
email address on
http://archives.free.net.ph/message/20090318.080211.28dac829.en.html
so I've replied via this list.


In reply to:

Author:?Bas Bahlmann || Steady IT Systeembeheer
Date:?2009-03-18?08:022009-03-18 08:02?-000UTC
To:?rsync
Subject:?Is it possible to make rsync VMware split .vmdk's aware?
Hi,



I am using rsync for my customers to have disaster recovery off-site
with files from a VMware Server (under Linux). All works very well, but
when I defragment the VM's (once a week) or Exchange defragments it's
datastore the disk layout changes offcourse and sometimes a lot.



What do I do:

- I am making a local copy with vmware-vdiskmanager to an USB
disk in the split "thin-disk" format of the vmdk's

- Then I start rsync to our datacenter to replicate the split
"thin-disk" vmdk's



What happens:

Sometimes, because of the defragment within the VM or Exchange, the disk
layout changes so much that a split .vmdk file that was very little and
now becomes filled with 2Gb data. As a result rsync has to transfer 2Gb
of data for that .vmdk which takes a lot of time. In my opnion that's
not nessesary because the data is probably available in another split
.vmdk because it was moved across the virtual disk.



My question:

Is it possible to make an option in Rsync which reads out the vmdk
config file for the split disks so it can search for known data across
all the split .vmdk files within one virtual disk? If this is possible
this will improve the rsync process in a major way!



The .vmdk config file looks like this:



Contents of "PVSBS2K3-1.vmdk":

# Disk DescriptorFile

version=1

CID=ee057ac0

parentCID=ffffffff

createType="twoGbMaxExtentSparse"



# Extent description

RW 4192256 SPARSE "PVSBS2K3-1-s001.vmdk"

RW 4192256 SPARSE "PVSBS2K3-1-s002.vmdk"

RW 4192256 SPARSE "PVSBS2K3-1-s003.vmdk"

RW 4192256 SPARSE "PVSBS2K3-1-s004.vmdk"

RW 4192256 SPARSE "PVSBS2K3-1-s005.vmdk"

RW 4192256 SPARSE "PVSBS2K3-1-s006.vmdk"

RW 4192256 SPARSE "PVSBS2K3-1-s007.vmdk"

RW 4192256 SPARSE "PVSBS2K3-1-s008.vmdk"

RW 4192256 SPARSE "PVSBS2K3-1-s009.vmdk"

RW 4192256 SPARSE "PVSBS2K3-1-s010.vmdk"

RW 4192256 SPARSE "PVSBS2K3-1-s011.vmdk"

RW 4192256 SPARSE "PVSBS2K3-1-s012.vmdk"

RW 2147202 SPARSE "PVSBS2K3-1-s013.vmdk"



# The Disk Data Base

#DDB



ddb.geometry.biosHeads = "255"

ddb.geometry.biosSectors = "63"

ddb.geometry.biosCylinders = "3265"

ddb.uuid = "60 00 C2 92 f3 f3 f2 72-66 dc e5 10 bd 92 16 44"

ddb.virtualHWVersion = "4"

ddb.toolsVersion = "6535"

ddb.geometry.cylinders = "3265"

ddb.geometry.heads = "255"

ddb.geometry.sectors = "63"

ddb.adapterType = "lsilogic"





I am looking forward to your answer,



Thanks in advance,



Bas Bahlmann

The Netherlands



--
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

--

http://www.finalcog.com/

Ryan Malayter

2009-Oct-09 13:32 UTC

head link

rsync, --sparse and VM disk images

From:?Bas Bahlmann || Steady IT Systeembeheer> I am using rsync for my customers to have disaster recovery off-site
> with files from a VMware Server (under Linux). All works very well, but
> when I defragment the VM's (once a week) or Exchange defragments
it's
> datastore the disk layout changes offcourse and sometimes a lot.
Defragmenting a virtual disk file is usually not a good idea with most
modern shared storage subsystems (the kind used often with VMware).
NetApp, LeftHand, EqualLogic, and most other arrays already
"virtualize" the block layout so they can do things like snapshots. So
defragmenting really doesn't help performance much and may actually
make things much worse. It also often breaks "thin provisioning" at
either the VMware or disk array layers, since new blocks are written
but the old ones are still allocated even though they are empty.

It also of course makes things tough on rsync, as massive amounts of
data change. While the file data blocks might still get matched if you
force a small block size, this increases CPU utilization for rsync
drastically. And the "index" structures of the filesystem will likely
be completely different and not be matched at all by rsync.

This is very similar to the problem of rsyncing database backup files
(Exchange, SQL Server, mysql, Oracle, whatever) that have had the
indexes rebuilt. There have been several threads on this recently.

> Sometimes, because of the defragment within the VM or Exchange, the disk
> layout changes so much that a split .vmdk file that was very little and
> now becomes filled with 2Gb data. As a result rsync has to transfer 2Gb
> of data for that .vmdk which takes a lot of time. In my opnion that's
> not nessesary because the data is probably available in another split
> .vmdk because it was moved across the virtual disk.
Again, defragmenting VMs usually is not helpful for this very reason.
Once blocks get allocated, you can't get them back. Also, using the 2
GB split VMs has always caused me problems. VMFS and every other
modern filesystem has no trouble with very big files, so keep things
simple and just use single large VMDKs for each virtual disk.
> Is it possible to make an option in Rsync which reads out the vmdk
> config file for the split disks so it can search for known data across
> all the split .vmdk files within one virtual disk? If this is possible
> this will improve the rsync process in a major way!
A VMware-specific enhancement to rsync is likely a non-starter.
-- 
RPM

Possibly Parallel Threads

Search for more reasonably related threads

rsync - Oct 2009 - rsync, --sparse and VM disk images

rsync, --sparse and VM disk images

rsync, --sparse and VM disk images

Possibly Parallel Threads