I have been doing some experiments with rsync on btrfs, a
copy-on-write file system that is approaching or having just achieved
production-ready status depending on your requirements.
For my purposes the reliability appears by almost all accounts to be
there, and the compression alone makes it very compelling.
However the following two experiments show rsync behaviors that are
disappointing to the point of appearing to be bugs. Certainly rsync is
more powerful if they are fixed. Of course, this is assuming that I
have not missed something in my tests.
--
Bottom line on top: rsync with --inplace appears to (wastefully)
rewrite the entire file even when only a single block or just the
meta-data (timestamp) has changed. While this is necessary behavior on
some file systems it is wasteful on copy-on-write systems.
I propose that rsync be changed to only write blocks that have changed
when --inplace is in effect. And that only meta-data be changed if the
underlying filesystem supports it.
In the experiments below the final results would have been 20BGB - 4KB
smaller had these changes been in place.
(Running rsync 3.0.9)
################################################
## Test rsync --inline
## 1) Start with an empty filesystem
$ df -h .
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/jobarchive-Ajobarchivetest2
300G 36M 293G 1% /vol/jobarchive_Ajobarchivetest2
## 2) Create a subvolume. Put one file with 10gb of random data in it.
## Note: Compression is turned on, but our random data defeats it.
$ btrfs subvolume create src
$ time dd if=/dev/urandom of=src/10gb bs=4k count=2621440 conv=notrunc
2621440+0 records in
2621440+0 records out
10737418240 bytes (11 GB) copied, 811.427 s, 13.2 MB/s
0.400u 806.115s 13:31.42 99.3% 0+0k 0+20971520io 0pf+0w
$ df -h .
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/jobarchive-Ajobarchivetest2
300G 11G 283G 4% /vol/jobarchive_Ajobarchivetest2
## 3) Create a second subvolume called current. Copy the first file into it.
$ btrfs subvolume create current
$ time cp --archive src/* current
0.057u 17.389s 0:42.29 41.2% 0+0k 19737984+20971520io 0pf+0w
$ df -h .
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/jobarchive-Ajobarchivetest2
300G 20G 274G 7% /vol/jobarchive_Ajobarchivetest2
## 4) Make a snapshot of the second volume called job1. Note that it
takes up almost no space.
$ btrfs subvolume snapshot current job1
$ df -h .
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/jobarchive-Ajobarchivetest2
300G 21G 273G 7% /vol/jobarchive_Ajobarchivetest2
## 5) Change the first 4k bytes of the original file
$ time dd if=/dev/urandom of=src/10gb bs=4k count=1 conv=notrunc
1+0 records in
1+0 records out
4096 bytes (4.1 kB) copied, 0.000601676 s, 6.8 MB/s
0.001u 0.001s 0:00.03 0.0% 0+0k 32+8io 1pf+0w
$ df -h .
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/jobarchive-Ajobarchivetest2
300G 21G 273G 7% /vol/jobarchive_Ajobarchivetest2
## 6) Use rsync --inplace to make a copy of the first file.
## Note:
## - We use --inplace to copy over the existing file
## - We do not use -W aka --whole-file so the delta-xfer algorithm
should be in play
## - The hope is that rsync will only rewrite the first block of the file
$ time
/usr/share/sbtools-sbjobarchive/external-apps/rsync/rsync-3.0.9/install/centos5-64/bin/rsync
--stats -az --timeout=600 --inplace src/ current/
Number of files: 2
Number of files transferred: 1
Total file size: 10737418240 bytes
Total transferred file size: 10737418240 bytes
Literal data: 10737418240 bytes
Matched data: 0 bytes
File list size: 52
File list generation time: 0.001 seconds
File list transfer time: 0.000 seconds
Total bytes sent: 10742659175
Total bytes received: 34
sent 10742659175 bytes received 34 bytes 11012464.59 bytes/sec
total size is 10737418240 speedup is 1.00
851.783u 79.265s 16:14.78 95.5% 0+0k 19752416+20971520io 17pf+0w
## 7) Alas the new file takes up a full extra 10 GB.
$ df -h .
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/jobarchive-Ajobarchivetest2
300G 31G 263G 11% /vol/jobarchive_Ajobarchivetest2
## Conclusion: rsync rewrote the entire file into current/10gb even
though it only needed to write the
## first 4k block. If it had we would have saved 10 GB - 4KB of disk.
################################################
## Test metadata-only change
## Start with files as above
$ btrfs subvolume snapshot current job2
Create a snapshot of 'current' in './job2'
$ df -h .
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/jobarchive-Ajobarchivetest2
300G 31G 263G 11% /vol/jobarchive_Ajobarchivetest2
## Change the meta-data of the first file then rsync with --inplace
$ touch src/10gb
$ time
/usr/share/sbtools-sbjobarchive/external-apps/rsync/rsync-3.0.9/install/centos5-64/bin/rsync
--stats -az --timeout=600 --inplace src/ current/
Number of files: 2
Number of files transferred: 1
Total file size: 10737418240 bytes
Total transferred file size: 10737418240 bytes
Literal data: 10737418240 bytes
Matched data: 0 bytes
File list size: 52
File list generation time: 0.001 seconds
File list transfer time: 0.000 seconds
Total bytes sent: 10742659172
Total bytes received: 31
sent 10742659172 bytes received 31 bytes 10620523.19 bytes/sec
total size is 10737418240 speedup is 1.00
920.122u 82.526s 16:50.71 99.2% 0+0k 20469728+20971520io 0pf+0w
$ df -h .
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/jobarchive-Ajobarchivetest2
300G 41G 253G 14% /vol/jobarchive_Ajobarchivetest2
## conclusion: Entire file was copied on the destination system even though only
## meta-data had changed. This is not necessary with a copy-on-write system.
--
Allen.Supynuk at gmail.com