Martin Baum
2010-Jan-06 11:00 UTC
Optimizing dd images of ext3 partitions: Only copy blocks in use by fs
Hello, for bare-metal recovery I need to create complete disk images of ext3 partitions of about 30 servers. I'm doing this by creating lvm2-snapshots and then dd'ing the snapshot-device to my backup media. (I am aware that backups created by this procedure are the equivalent of hitting the power switch at the time the snapshot was taken.) This works great and avoids a lot of seeks on highly utilized file systems. However it wastes a lot of space for disks with nearly empty filesystems. It would be a lot better if I could only read the blocks from raw disk that are really in use by ext3 (the rest could be sparse in the imagefile created). Is there a way to do this? I am aware that e2image -r dumps all metadata. Is there a tool that does not only dump metadata but also the data blocks? (maybe even in a way that avoids seeks by compiling a list of blocks first and then reading them in disk-order) If not: Is there a tool I can extend to do so / can you point me into the righ direction? (I tried dumpfs, however it dumps inodes on a per-directory base. Skimming through the source I did not see any optimization regarding seeks. So on highly populated filesystems dumpfs still is slower than full images with dd for me.) Thanks a lot, Martin
Andreas Dilger
2010-Jan-06 21:09 UTC
Optimizing dd images of ext3 partitions: Only copy blocks in use by fs
On 2010-01-06, at 04:00, Martin Baum wrote:> for bare-metal recovery I need to create complete disk images of > ext3 partitions of about 30 servers. I'm doing this by creating lvm2- > snapshots and then dd'ing the snapshot-device to my backup media. (I > am aware that backups created by this procedure are the equivalent > of hitting the power switch at the time the snapshot was taken.) > > This works great and avoids a lot of seeks on highly utilized file > systems. However it wastes a lot of space for disks with nearly > empty filesystems. > > It would be a lot better if I could only read the blocks from raw > disk that are really in use by ext3 (the rest could be sparse in the > imagefile created). Is there a way to do this?You can use "dump" which will read only the in-use blocks, but it doesn't create a full disk image. The other trick that I've used for similar situations is to write a file of all zeroes to the filesystem until it is full (e.g. dd if=/dev/ zero of=/foo) and then the backup will be able to compress quite well. If the filesystem is in use, you should stop before the filesystem is completely full, and also unlink the file right after it is created, so in case of trouble the file will automatically be unlinked (even after a crash).> I am aware that e2image -r dumps all metadata. Is there a tool that > does not only dump metadata but also the data blocks? (maybe even in > a way that avoids seeks by compiling a list of blocks first and then > reading them in disk-order) If not: Is there a tool I can extend to > do so / can you point me into the righ direction? > > (I tried dumpfs, however it dumps inodes on a per-directory base. > Skimming through the source I did not see any optimization regarding > seeks. So on highly populated filesystems dumpfs still is slower > than full images with dd for me.)Optimizing dump to e.g. sort inodes might help the performance, if that isn't already done. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.