Hello, I have observed extremely slow metadata performance with btrfs. This may be a bit of a nightmare scenario; it involves untarring a backup of 1.6TB of backuppc data, which contains millions of hardlinks and much data, onto USB 2.0 disks. I have run disk monitoring tools such as dstat while performing these operations to see what''s going on. The behavior I notice is this: * When unpacking large files, the USB drives sustain activity in the 20-40 MB/s range, as expected. * When creating vast numbers of hardlinks instead, the activity is roughly this: o Bursts of output from tar due to -v, sometimes corresponding to reads in the 300KB/s range (I suspect this has to do with caching) o Tar blocked for minutes while writes to the disk occur, in the 300-600KB/s range. This occurs even when nobarrier,noatime are specified as mount options. I know the disk is capable of far more, because btrfs gets far more from it when writing large files. There are two USB drives in this btrfs filesystem: a 1TB and a 2TB drive. I have tried the raid1, raid0, and single metadata profiles. Anecdotal evidence suggests that raid1 performs the worst, raid0 the best, and single somewhere in between. The data is in single mode. Is this behavior known and expected? Thanks, John -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
John Goerzen posted on Thu, 05 Dec 2013 11:52:04 -0600 as excerpted:> Hello, > > I have observed extremely slow metadata performance with btrfs. This may > be a bit of a nightmare scenario; it involves untarring a backup of > 1.6TB of backuppc data, which contains millions of hardlinks and much > data, onto USB 2.0 disks.> Is this behavior known and expected?Yes. Btrfs doesn''t do well with lots of hardlinks and indeed until relatively recently had a hard-limit on the number of hardlinks possible within a directory, that hardlink-heavy use-cases would regularly hit. That was worked around, but there''s an additional level of indirection once the first level link-pool is filled, and you''re not the first to have observed that btrfs performance isn''t the best in that sort of scenario. That''s known. Other filesystems will probably do quite a bit better for hardlink style backups and other hardlink-heavy use-cases. Either that, or consider using btrfs, but with some other form of backup, possibly btrfs snapshots, or COW reflinks. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 5 Dec 2013 11:52:04 John Goerzen wrote:> I have observed extremely slow metadata performance with btrfs. This may > be a bit of a nightmare scenario; it involves untarring a backup of > 1.6TB of backuppc data, which contains millions of hardlinks and much > data, onto USB 2.0 disks.How does this compare to using Ext4 on the same hardware and same data?> I have run disk monitoring tools such as dstat while performing these > operations to see what''s going on. > > The behavior I notice is this: > > * When unpacking large files, the USB drives sustain activity in the > 20-40 MB/s range, as expected. > * When creating vast numbers of hardlinks instead, the activity is > roughly this: > o Bursts of output from tar due to -v, sometimes corresponding to > reads in the 300KB/s range (I suspect this has > to do with caching) > o Tar blocked for minutes while writes to the disk occur, in the > 300-600KB/s range.Is iostat indicating that the disk is at 100% capacity?> This occurs even when nobarrier,noatime are specified as mount options. > I know the disk is capable of far more, because btrfs gets > far more from it when writing large files.Write speeds as low as 600KB/s isn''t uncommon when there''s lots of seeks. I''ve seen similar performance from RAID arrays. Is BTRFS doing much worse than Ext4 in terms of the number of seeks needed for writing that data?> There are two USB drives in this btrfs filesystem: a 1TB and a 2TB > drive. I have tried the raid1, raid0, and single metadata profiles. > Anecdotal evidence suggests that raid1 performs the worst, raid0 the > best, and single somewhere in between. The data is in single mode. > > Is this behavior known and expected?Last time I did Postal tests I didn''t find a lot of difference between BTRFS and Ext4 when using 60K average file size (from memory) on a single partition. BTRFS did worse when it was using internal RAID-1 and Ext4 was on a Linux software RAID-1. But 60K file size may be larger than the average file you use if you use lots of links. For backups BTRFS seems to perform a lot better (for both creation and removing snapshots) if you use snapshots instead of "cp -rl" or equivalent. However I have had ongoing problems of BTRFS hanging on snapshot removal which aren''t fixed in the latest Debian packaged kernels. -- My Main Blog http://etbe.coker.com.au/ My Documents Blog http://doc.coker.com.au/ -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 12/05/2013 05:32 PM, Russell Coker wrote:> On Thu, 5 Dec 2013 11:52:04 John Goerzen wrote: > > > I have observed extremely slow metadata performance with btrfs. This may > > > be a bit of a nightmare scenario; it involves untarring a backup of > > > 1.6TB of backuppc data, which contains millions of hardlinks and much > > > data, onto USB 2.0 disks. > > How does this compare to using Ext4 on the same hardware and same data?Hi Russell, I can''t perform a direct apples-to-apples comparison here, because the capabilities of the filesystems are dissimilar. We''re talking two USB drives, one of them 1TB and the other 2TB. With ext4, I used LVM to combine them into a single volume (no striping). With the best case in btrfs (-m raid0 -d raid0 -- yes I now know that wastes space), it is still slower than ext4. Overall performance with backuppc is somewhat slower. Creation and sometimes deletion of vast numbers of hardlinks, or of vast numbers of empty directories, is much slower and can lead to processes blocked waiting for I/O to complete for so long that they trigger kernel hung task warnings in dmesg with btrfs. Even a simple ls on a directory with <20 files can take minutes to complete when tar is creating these directories or links. one other datapoint: zfs, even zfs-fuse, on the exact same workload as btrfs is significantly faster.> Write speeds as low as 600KB/s isn''t uncommon when there''s lots of > seeks. I''ve seen similar performance from RAID arrays. Is BTRFS doing > much worse than Ext4 in terms of the number of seeks needed for writing > that data?The strange thing is that these writes come in bursts during which userland access to the filesystem is apparently paused. This suggests to me that there is some caching going on here (perfectly fine). But given that, it seems some reordering could be taking place here? usb-storage does not support NCQ, so perhaps also this is an issue of higher latency on USB vs. SATA/SCSI. -- John -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 12/05/2013 05:32 PM, Russell Coker wrote:> On Thu, 5 Dec 2013 11:52:04 John Goerzen wrote: > > > I have observed extremely slow metadata performance with btrfs. This may > > > be a bit of a nightmare scenario; it involves untarring a backup of > > > 1.6TB of backuppc data, which contains millions of hardlinks and much > > > data, onto USB 2.0 disks. > > How does this compare to using Ext4 on the same hardware and same data?Hi Russell, I can''t perform a direct apples-to-apples comparison here, because the capabilities of the filesystems are dissimilar. We''re talking two USB drives, one of them 1TB and the other 2TB. With ext4, I used LVM to combine them into a single volume (no striping). With the best case in btrfs (-m raid0 -d raid0 -- yes I now know that wastes space), it is still slower than ext4. Overall performance with backuppc is somewhat slower. Creation and sometimes deletion of vast numbers of hardlinks, or of vast numbers of empty directories, is much slower and can lead to processes blocked waiting for I/O to complete for so long that they trigger kernel hung task warnings in dmesg with btrfs. Even a simple ls on a directory with <20 files can take minutes to complete when tar is creating these directories or links. one other datapoint: zfs, even zfs-fuse, on the exact same workload as btrfs is significantly faster.> Write speeds as low as 600KB/s isn''t uncommon when there''s lots of > seeks. I''ve seen similar performance from RAID arrays. Is BTRFS doing > much worse than Ext4 in terms of the number of seeks needed for writing > that data?The strange thing is that these writes come in bursts during which userland access to the filesystem is apparently paused. This suggests to me that there is some caching going on here (perfectly fine). But given that, it seems some reordering could be taking place here? usb-storage does not support NCQ, so perhaps also this is an issue of higher latency on USB vs. SATA/SCSI. -- John -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Dec 05, 2013 at 07:39:30PM +0000, Duncan wrote:> John Goerzen posted on Thu, 05 Dec 2013 11:52:04 -0600 as excerpted: > > > Hello, > > > > I have observed extremely slow metadata performance with btrfs. This may > > be a bit of a nightmare scenario; it involves untarring a backup of > > 1.6TB of backuppc data, which contains millions of hardlinks and much > > data, onto USB 2.0 disks. > > > Is this behavior known and expected? > > Yes. Btrfs doesn''t do well with lots of hardlinks and indeed until > relatively recently had a hard-limit on the number of hardlinks possible > within a directory, that hardlink-heavy use-cases would regularly hit. > That was worked around, but there''s an additional level of indirection > once the first level link-pool is filled, and you''re not the first to > have observed that btrfs performance isn''t the best in that sort of > scenario. That''s known. > > Other filesystems will probably do quite a bit better for hardlink style > backups and other hardlink-heavy use-cases. Either that, or consider > using btrfs, but with some other form of backup, possibly btrfs > snapshots, or COW reflinks.Thanks for explaining this. I''m one of those people who uses cp -al and rsync to do backups. Indeed I should likely rework the flow to use subvolumes and snapshots. You also mentioned reflinks, and it sounds like I can use cp -a --reflink instead of cp -al. Also, would the dedupe code in btrfs effectively allow for the same thing after the fact if you use cp without --reflink? Is it stable enough nowadays? Thanks, Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems .... .... what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | PGP 1024R/763BE901 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Marc MERLIN <marc@merlins.org> schrieb:> I''m one of those people who uses cp -al and rsync to do backups. Indeed > I should likely rework the flow to use subvolumes and snapshots. > You also mentioned reflinks, and it sounds like I can use > cp -a --reflink instead of cp -al. > > Also, would the dedupe code in btrfs effectively allow for the same > thing after the fact if you use cp without --reflink? Is it stable > enough nowadays?You may want to try my backup script as a starting point: https://gist.github.com/kakra/5520370 It uses a scratch area to create a mirror of your system, then - if it was successful - take a snapshot of it. Next time the backup runs, the scratch area is updated with rsync using its features for in-file delta updates, so you get optimal small deltas between each snapshot. Normally, rsync would create a new copy of each file in local mode, so using these features is important. My script is meant to be run from systemd with auto-mount backup disk. But it should be easy to adapt to cron if you need to. All the work is done within the bash script. You may wonder about the scratch area. I took this route to ensure that snapshots always contain clean and consistent backups. If there was a problem during rsync, there will be no snapshot. It''s that easy. Usually I see other solutions taking a snapshot first, then modify this snapshot. So you never know if snapshots are clean and consistent. With my approach you get clean snapshots and even if the last backup broke and you are thus missing a snapshot, you still have data in the scratch area to recover from. In my scenario my backup script is able to hold several weeks of daily backups in the backlog. The destination is mounted with compress-force=zlib so I get good compression, too. However, I still had no time for implementing automatic cleanup of old snapshots. Feel free to add it. HTH Kai -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html