I have a backup volume on an ext4 filesystem that is using rsync and it''s --link-dest option to create "hard-linked incremental" backups. I am sure everyone here is familiar with the technique but in case anyone isn''t basically it''s effectively doing (each backup): # cp -al /backup/previous-backup/ /backup/current-backup # rsync -aAHX ... --exclude /backup / /backup/current-backup The shortcoming of this of course is that it just takes 1 byte in a (possibly huge) file to require that the whole file be recopied to the backup. btrfs and it''s CoW capability to the rescue -- again, no surprise to anyone here. So I replicated a few of the directories in my backup volume to a btrfs volume using snapshots for each backup to take advantage of CoW and with any luck, avoid entire file duplication where only some subset of the file has changed. Overall, it seems that I saw success. Most backups on btrfs were smaller than their source, and overall, for all of the backups replicated, the use was less. There were some however that were significantly larger. Here''s the analysis: Backup btrfs ext4 ------ ----- ---- monthly.22: 112GiB 113GiB 98% monthly.21: 14GiB 14GiB 95% monthly.20: 19GiB 20GiB 94% monthly.19: 12GiB 13GiB 94% monthly.18: 5GiB 6GiB 87% monthly.17: 11GiB 12GiB 92% monthly.16: 8GiB 10GiB 82% monthly.15: 16GiB 11GiB 146% monthly.14: 19GiB 20GiB 94% monthly.13: 21GiB 22GiB 96% monthly.12: 61GiB 67GiB 91% monthly.11: 24GiB 22GiB 106% monthly.10: 22GiB 19GiB 114% monthly.9: 12GiB 13GiB 90% monthly.8: 15GiB 17GiB 91% monthly.7: 9GiB 11GiB 87% monthly.6: 8GiB 9GiB 85% monthly.5: 16GiB 18GiB 91% monthly.4: 13GiB 15GiB 89% monthly.3: 11GiB 19GiB 62% monthly.2: 29GiB 22GiB 134% monthly.1: 23GiB 24GiB 94% monthly.0: 5GiB 5GiB 94% Total: 497GiB 512GiB 96% btrfs use is a calculation of the "df" value of the fileystem before and after each backup. ext4 (rsync, really) use is calculated with "du -xks" on the whole backup volume, which as you know only counts a multiply hard-linked file''s space use once. So as you can see, for the most part, btrfs and CoW was more efficient, but in some cases (i.e. monthly.15, monthly.11, monthly.10, monthly.2) it was less efficient. Taking the biggest anomaly, monthly.15, a du of just that directory on both the btrfs and ext4 filesystems shows results I would expect: btrfs: 136,876,580 monthly.15 ext4: 142,153,928 monthly.15 Yet the before and after "df" results show the btrfs usage higher than ext4. Is there some "periodic" jump in "overhead" used by btrfs that would account for this mysterious increased usage in some of the copies? Any other ideas for the anomalous results? Cheers, b.
On Sun, Mar 6, 2011 at 10:46 PM, Brian J. Murrell <brian@interlinx.bc.ca> wrote:> # cp -al /backup/previous-backup/ /backup/current-backup > # rsync -aAHX ... --exclude /backup / /backup/current-backup > > The shortcoming of this of course is that it just takes 1 byte in a > (possibly huge) file to require that the whole file be recopied to the > backup.If you have snapshots anyway, why not : - create a snapshot before each backup run - use the same directory (e.g. just /backup), no need to "cp" anything - add "--inplace" to rsync -- Fajar -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sun, 2011-03-06 at 10:46 -0500, Brian J. Murrell wrote:> I have a backup volume on an ext4 filesystem that is using rsync and > it''s --link-dest option to create "hard-linked incremental" backups. I > am sure everyone here is familiar with the technique but in case anyone > isn''t basically it''s effectively doing (each backup):> So I replicated a few of the directories in my backup volume to a btrfs > volume using snapshots for each backup to take advantage of CoW and with > any luck, avoid entire file duplication where only some subset of the > file has changed. > > Overall, it seems that I saw success. Most backups on btrfs were > smaller than their source, and overall, for all of the backups > replicated, the use was less. There were some however that were > significantly larger. Here''s the analysis:> Taking the biggest anomaly, monthly.15, a du of just that directory on > both the btrfs and ext4 filesystems shows results I would expect: > > btrfs: 136,876,580 monthly.15 > ext4: 142,153,928 monthly.15 > > Yet the before and after "df" results show the btrfs usage higher than > ext4. Is there some "periodic" jump in "overhead" used by btrfs that > would account for this mysterious increased usage in some of the copies?There actually is such a periodic jump in overhead, caused by the way which btrfs dynamically allocates space for metadata as needed by the creation of new files, which it does whenever the free metadata space ratio reaches a threshold (it''s probably more complicated than that, but close enough for now). To see exactly what''s going on, you should use the "btrfs filesystem df" command to see how space is being allocated for data and metadata separately: ayu ~ # btrfs fi df / Data: total=266.01GB, used=249.35GB System, DUP: total=8.00MB, used=36.00KB Metadata, DUP: total=3.62GB, used=1.93GB ayu ~ # df -h / Filesystem Size Used Avail Use% Mounted on /dev/sda4 402G 254G 145G 64% / If you use the btrfs tool''s df command to account for space in your testing, you should get much more accurate results. -- Calvin Walton <calvin.walton@kepstin.ca> -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 11-03-06 11:02 AM, Fajar A. Nugraha wrote:> > If you have snapshots anyway, why not : > - create a snapshot before each backup run > - use the same directory (e.g. just /backup), no need to "cp" anything > - add "--inplace" to rsyncWhich is exactly what I am doing. There is no "cp" involved in making the btrfs copies of the existing backup. It''s simply "rsync -aAXH ... --inplace" from the existing backup archive to the new, btrfs archive. Cheers, b.
On 11-03-06 11:06 AM, Calvin Walton wrote:> > There actually is such a periodic jump in overhead,Ahh. So my instincts were correct.> caused by the way > which btrfs dynamically allocates space for metadata as needed by the > creation of new files, which it does whenever the free metadata space > ratio reaches a threshold (it''s probably more complicated than that, but > close enough for now).Sounds fair enough.> To see exactly what''s going on, you should use the "btrfs filesystem df" > command to see how space is being allocated for data and metadata > separately: > > ayu ~ # btrfs fi df / > Data: total=266.01GB, used=249.35GB > System, DUP: total=8.00MB, used=36.00KB > Metadata, DUP: total=3.62GB, used=1.93GB > ayu ~ # df -h / > Filesystem Size Used Avail Use% Mounted on > /dev/sda4 402G 254G 145G 64% / > > If you use the btrfs tool''s df command to account for space in your > testing, you should get much more accurate results.Indeed! Unfortunately that tool seems to be completely silent on my system: # btrfs filesystem df /mnt/btrfs-test/ # btrfs filesystem df /mnt/btrfs-test Where /mnt/btrfs-test is where I have the device that I created the btrfs filesystem on mounted. i.e.: # grep btrfs /proc/mounts /dev/mapper/btrfs--test-btrfs--test /mnt/btrfs-test btrfs rw,relatime 0 0 My btrfs-tools appears to be from 20101101. The changelog says: * Merging upstream version 0.19+20101101. Cheers, b.
On Sun, 2011-03-06 at 23:02 +0700, Fajar A. Nugraha wrote:> On Sun, Mar 6, 2011 at 10:46 PM, Brian J. Murrell <brian@interlinx.bc.ca> wrote: > > # cp -al /backup/previous-backup/ /backup/current-backup > > # rsync -aAHX ... --exclude /backup / /backup/current-backup > > > > The shortcoming of this of course is that it just takes 1 byte in a > > (possibly huge) file to require that the whole file be recopied to the > > backup. > > If you have snapshots anyway, why not : > - create a snapshot before each backup run > - use the same directory (e.g. just /backup), no need to "cp" anything > - add "--inplace" to rsyncTo add a bit to this: if you *do not* use the --inplace option on rsync, rsync will rewrite the entire file, instead of updating the existing file! This of course negates some of the benefits of btrfs''s COW support when doing incremental backups. -- Calvin Walton <calvin.walton@kepstin.ca> -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 11-03-06 11:17 AM, Calvin Walton wrote:> > To add a bit to this: if you *do not* use the --inplace option on rsync, > rsync will rewrite the entire file, instead of updating the existing > file!Of course. As I mentioned to Fajar previously, I am indeed using --inplace when copying from the existing archive to the new btrfs archive.> This of course negates some of the benefits of btrfs''s COW support when > doing incremental backups.Absolutely. b.
On Sun, Mar 6, 2011 at 8:02 AM, Fajar A. Nugraha <list@fajar.net> wrote:> On Sun, Mar 6, 2011 at 10:46 PM, Brian J. Murrell <brian@interlinx.bc.ca> wrote: >> # cp -al /backup/previous-backup/ /backup/current-backup >> # rsync -aAHX ... --exclude /backup / /backup/current-backup >> >> The shortcoming of this of course is that it just takes 1 byte in a >> (possibly huge) file to require that the whole file be recopied to the >> backup. > > If you have snapshots anyway, why not : > - create a snapshot before each backup run > - use the same directory (e.g. just /backup), no need to "cp" anything > - add "--inplace" to rsyncYou may also want to test with/without --no-whole-file as well. That''s most useful when the two filesystems are on the same system and should reduce the amount of data copied around, as it forces rsync to only use file deltas. This is very much a win on ZFS, which is also CoW, so it should be a win on Btrfs. -- Freddie Cash fjwcash@gmail.com -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 11-03-06 11:06 AM, Calvin Walton wrote:> > To see exactly what''s going on, you should use the "btrfs filesystem df" > command to see how space is being allocated for data and metadata > separately:OK. So with an empty filesystem, before my first copy (i.e. the base on which the next copy will CoW from) df reports: Filesystem 1K-blocks Used Available Use% Mounted on /dev/mapper/btrfs--test-btrfs--test 922746880 56 922746824 1% /mnt/btrfs-test and btrfs fi df reports: Data: total=8.00MB, used=0.00 Metadata: total=1.01GB, used=24.00KB System: total=12.00MB, used=4.00KB after the first copy df and btrfs fi df report: Filesystem 1K-blocks Used Available Use% Mounted on /dev/mapper/btrfs--test-btrfs--test 922746880 121402328 801344552 14% /mnt/btrfs-test root@linux:/mnt/btrfs-test# cat .snapshots/monthly.22/metadata/btrfs_df-stop Data: total=110.01GB, used=109.26GB Metadata: total=5.01GB, used=3.26GB System: total=12.00MB, used=24.00KB So it''s clear that total usage (as reported by df) was 121,402,328KB but Metadata has two values: Metadata: total=5.01GB, used=3.26GB What''s the difference between total and used? And for that matter, what''s the difference between the total and used for Data (total=110.01GB, used=109.26GB)? Even if I take the largest values (i.e. the total values) for Data and Metadata (each converted to KB first) and add them up they are: 120,607,211.52 which is not quite the 121,402,328 that df reports. There is a 795,116.48KB discrepancy. In any case, which value from a btrfs df fi should I be subtracting from df''s accounting to get a real accounting of the amount of data used? Cheers, b.
I''m not a developer, but I think it goes something like this: btrfs doesn''t write the filesystem on the entire device/partition at format time, rather, it dynamically increases the size of the filesystem as data is used. That''s why formating a disk in btrfs can be so fast. On Wed, Mar 23, 2011 at 12:39 PM, Brian J. Murrell <brian@interlinx.bc.ca> wrote:> > On 11-03-06 11:06 AM, Calvin Walton wrote: > > > > To see exactly what''s going on, you should use the "btrfs filesystem df" > > command to see how space is being allocated for data and metadata > > separately: > > OK. So with an empty filesystem, before my first copy (i.e. the base on > which the next copy will CoW from) df reports: > > Filesystem 1K-blocks Used Available Use% Mounted on > /dev/mapper/btrfs--test-btrfs--test > 922746880 56 922746824 1% /mnt/btrfs-test > > and btrfs fi df reports: > > Data: total=8.00MB, used=0.00 > Metadata: total=1.01GB, used=24.00KB > System: total=12.00MB, used=4.00KB > > after the first copy df and btrfs fi df report: > > Filesystem 1K-blocks Used Available Use% Mounted on > /dev/mapper/btrfs--test-btrfs--test > 922746880 121402328 801344552 14% /mnt/btrfs-test > > root@linux:/mnt/btrfs-test# cat .snapshots/monthly.22/metadata/btrfs_df-stop > Data: total=110.01GB, used=109.26GB > Metadata: total=5.01GB, used=3.26GB > System: total=12.00MB, used=24.00KB > > So it''s clear that total usage (as reported by df) was 121,402,328KB but > Metadata has two values: > > Metadata: total=5.01GB, used=3.26GB > > What''s the difference between total and used? And for that matter, > what''s the difference between the total and used for Data > (total=110.01GB, used=109.26GB)? > > Even if I take the largest values (i.e. the total values) for Data and > Metadata (each converted to KB first) and add them up they are: > 120,607,211.52 which is not quite the 121,402,328 that df reports. > There is a 795,116.48KB discrepancy. > > In any case, which value from a btrfs df fi should I be subtracting from > df''s accounting to get a real accounting of the amount of data used? > > Cheers, > b. >-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 11-03-23 11:53 AM, Chester wrote:> I''m not a developer, but I think it goes something like this: > btrfs doesn''t write the filesystem on the entire device/partition at > format time, rather, it dynamically increases the size of the > filesystem as data is used. That''s why formating a disk in btrfs can > be so fast.Indeed, this much is understood, which is why I am using btrfs fi df to try to determine how much of the increase in raw device usage is the dynamic allocation of metadata. Cheers, b.
> So it''s clear that total usage (as reported by df) was 121,402,328KB but > Metadata has two values: > > Metadata: total=5.01GB, used=3.26GB > > What''s the difference between total and used? And for that matter, > what''s the difference between the total and used for Data > (total=110.01GB, used=109.26GB)? >total is the space allocated (reserved) for a kind usage (metadata or data) the space allocated for a kind of usage can''t be used for something else. The used value is the space that is used from the space that has been allocated for a kind of usage. The wiki gives you a overview how to interpret the values: https://btrfs.wiki.kernel.org/index.php/FAQ#btrfs_filesystem_df_.2Fmountpoint cheers Kolja. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html