Hi Bas, I'm not sure if this is of interest, but I also had issues with VM disk-image sparse-files (in my case KVM, rather than VMWare), which I've now resolved. http://www.finalcog.com/rsync-vm-sparse-inplace-kvm-vmware All the best, Chris Dew. P.S. Apologies for any breach of etiquette - I could not see Bas' email address on http://archives.free.net.ph/message/20090318.080211.28dac829.en.html so I've replied via this list. In reply to: Author:?Bas Bahlmann || Steady IT Systeembeheer Date:?2009-03-18?08:022009-03-18 08:02?-000UTC To:?rsync Subject:?Is it possible to make rsync VMware split .vmdk's aware? Hi, I am using rsync for my customers to have disaster recovery off-site with files from a VMware Server (under Linux). All works very well, but when I defragment the VM's (once a week) or Exchange defragments it's datastore the disk layout changes offcourse and sometimes a lot. What do I do: - I am making a local copy with vmware-vdiskmanager to an USB disk in the split "thin-disk" format of the vmdk's - Then I start rsync to our datacenter to replicate the split "thin-disk" vmdk's What happens: Sometimes, because of the defragment within the VM or Exchange, the disk layout changes so much that a split .vmdk file that was very little and now becomes filled with 2Gb data. As a result rsync has to transfer 2Gb of data for that .vmdk which takes a lot of time. In my opnion that's not nessesary because the data is probably available in another split .vmdk because it was moved across the virtual disk. My question: Is it possible to make an option in Rsync which reads out the vmdk config file for the split disks so it can search for known data across all the split .vmdk files within one virtual disk? If this is possible this will improve the rsync process in a major way! The .vmdk config file looks like this: Contents of "PVSBS2K3-1.vmdk": # Disk DescriptorFile version=1 CID=ee057ac0 parentCID=ffffffff createType="twoGbMaxExtentSparse" # Extent description RW 4192256 SPARSE "PVSBS2K3-1-s001.vmdk" RW 4192256 SPARSE "PVSBS2K3-1-s002.vmdk" RW 4192256 SPARSE "PVSBS2K3-1-s003.vmdk" RW 4192256 SPARSE "PVSBS2K3-1-s004.vmdk" RW 4192256 SPARSE "PVSBS2K3-1-s005.vmdk" RW 4192256 SPARSE "PVSBS2K3-1-s006.vmdk" RW 4192256 SPARSE "PVSBS2K3-1-s007.vmdk" RW 4192256 SPARSE "PVSBS2K3-1-s008.vmdk" RW 4192256 SPARSE "PVSBS2K3-1-s009.vmdk" RW 4192256 SPARSE "PVSBS2K3-1-s010.vmdk" RW 4192256 SPARSE "PVSBS2K3-1-s011.vmdk" RW 4192256 SPARSE "PVSBS2K3-1-s012.vmdk" RW 2147202 SPARSE "PVSBS2K3-1-s013.vmdk" # The Disk Data Base #DDB ddb.geometry.biosHeads = "255" ddb.geometry.biosSectors = "63" ddb.geometry.biosCylinders = "3265" ddb.uuid = "60 00 C2 92 f3 f3 f2 72-66 dc e5 10 bd 92 16 44" ddb.virtualHWVersion = "4" ddb.toolsVersion = "6535" ddb.geometry.cylinders = "3265" ddb.geometry.heads = "255" ddb.geometry.sectors = "63" ddb.adapterType = "lsilogic" I am looking forward to your answer, Thanks in advance, Bas Bahlmann The Netherlands -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html -- http://www.finalcog.com/
From:?Bas Bahlmann || Steady IT Systeembeheer> I am using rsync for my customers to have disaster recovery off-site > with files from a VMware Server (under Linux). All works very well, but > when I defragment the VM's (once a week) or Exchange defragments it's > datastore the disk layout changes offcourse and sometimes a lot.Defragmenting a virtual disk file is usually not a good idea with most modern shared storage subsystems (the kind used often with VMware). NetApp, LeftHand, EqualLogic, and most other arrays already "virtualize" the block layout so they can do things like snapshots. So defragmenting really doesn't help performance much and may actually make things much worse. It also often breaks "thin provisioning" at either the VMware or disk array layers, since new blocks are written but the old ones are still allocated even though they are empty. It also of course makes things tough on rsync, as massive amounts of data change. While the file data blocks might still get matched if you force a small block size, this increases CPU utilization for rsync drastically. And the "index" structures of the filesystem will likely be completely different and not be matched at all by rsync. This is very similar to the problem of rsyncing database backup files (Exchange, SQL Server, mysql, Oracle, whatever) that have had the indexes rebuilt. There have been several threads on this recently.> Sometimes, because of the defragment within the VM or Exchange, the disk > layout changes so much that a split .vmdk file that was very little and > now becomes filled with 2Gb data. As a result rsync has to transfer 2Gb > of data for that .vmdk which takes a lot of time. In my opnion that's > not nessesary because the data is probably available in another split > .vmdk because it was moved across the virtual disk.Again, defragmenting VMs usually is not helpful for this very reason. Once blocks get allocated, you can't get them back. Also, using the 2 GB split VMs has always caused me problems. VMFS and every other modern filesystem has no trouble with very big files, so keep things simple and just use single large VMDKs for each virtual disk.> Is it possible to make an option in Rsync which reads out the vmdk > config file for the split disks so it can search for known data across > all the split .vmdk files within one virtual disk? If this is possible > this will improve the rsync process in a major way!A VMware-specific enhancement to rsync is likely a non-starter. -- RPM