Phoronix periodically runs benchmarks on filesystems, and one thing I have noticed is that btrfs always does terribly on their fio "Intel IOMeter fileserver access pattern" benchmark: http://www.phoronix.com/scan.php?page=article&item=linux_310_10fs&num=2 Here, btrfs is more than 6 times slower than ext4, and about 3 times slower than XFS. Lest we attribute it to an unavoidable downside of COW filesystems and move on...no, we cannot do that, because ZFS does well here -- btrfs is about 6 times slower than ZFS! Note that btrfs does quite well in the other Phoronix benchmarks. It is just the fio fileserver benchmark that btrfs has problems with. What is going on here? Why is btrfs doing so poorly? -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Aug 08, 2013 at 09:13:04AM -0700, John Williams wrote:> Phoronix periodically runs benchmarks on filesystems, and one thing I > have noticed is that btrfs always does terribly on their fio "Intel > IOMeter fileserver access pattern" benchmark: > > http://www.phoronix.com/scan.php?page=article&item=linux_310_10fs&num=2 > > Here, btrfs is more than 6 times slower than ext4, and about 3 times > slower than XFS. > > Lest we attribute it to an unavoidable downside of COW filesystems and > move on...no, we cannot do that, because ZFS does well here -- btrfs > is about 6 times slower than ZFS! > > Note that btrfs does quite well in the other Phoronix benchmarks. It > is just the fio fileserver benchmark that btrfs has problems with. > > What is going on here? Why is btrfs doing so poorly?Excellent question, I''ll get back to you on that. Thanks, Josef -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Clemens Eisserer
2013-Aug-08 18:37 UTC
Re: Why does btrfs benchmark so badly in this case?
> What is going on here? Why is btrfs doing so poorly?Funny thing, I was thinking exactly the same when reading the article ;) Regards -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Aug 08, 2013 at 09:13:04AM -0700, John Williams wrote:> Phoronix periodically runs benchmarks on filesystems, and one thing I > have noticed is that btrfs always does terribly on their fio "Intel > IOMeter fileserver access pattern" benchmark: > > http://www.phoronix.com/scan.php?page=article&item=linux_310_10fs&num=2 > > Here, btrfs is more than 6 times slower than ext4, and about 3 times > slower than XFS. > > Lest we attribute it to an unavoidable downside of COW filesystems and > move on...no, we cannot do that, because ZFS does well here -- btrfs > is about 6 times slower than ZFS! > > Note that btrfs does quite well in the other Phoronix benchmarks. It > is just the fio fileserver benchmark that btrfs has problems with. > > What is going on here? Why is btrfs doing so poorly?So the reason this workload sucks for btrfs is because we fall back on buffered IO because fio does not do block size aligned writes for this workload. If you add ba=4k to the iometer fio file then we go the same speed as xfs and ext4. Not a whole lot we can do about this since unaligned writes means we have to read in pages to cow the block properly, which is why we fall back to buffered. Once we do that we end up having a lot of page locking stuff that gets in the way and makes us twice as slow. Thanks, Josef -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Aug 8, 2013 at 12:40 PM, Josef Bacik <jbacik@fusionio.com> wrote:> On Thu, Aug 08, 2013 at 09:13:04AM -0700, John Williams wrote: >> Phoronix periodically runs benchmarks on filesystems, and one thing I >> have noticed is that btrfs always does terribly on their fio "Intel >> IOMeter fileserver access pattern" benchmark: >> >> http://www.phoronix.com/scan.php?page=article&item=linux_310_10fs&num=2> So the reason this workload sucks for btrfs is because we fall back on buffered > IO because fio does not do block size aligned writes for this workload. If you > add > > ba=4k > > to the iometer fio file then we go the same speed as xfs and ext4. Not a whole > lot we can do about this since unaligned writes means we have to read in pages > to cow the block properly, which is why we fall back to buffered. Once we do > that we end up having a lot of page locking stuff that gets in the way and makes > us twice as slow. Thanks,Thanks for looking into it. So I guess the reason that ZFS does well with that workload is that ZFS is using smaller blocks, maybe just 512B ? I wonder how common these type of non-4K aligned workloads are. Apparently, people with such workloads should avoid btrfs, but maybe these types of workloads are very rare? -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Aug 08, 2013 at 01:23:22PM -0700, John Williams wrote:> On Thu, Aug 8, 2013 at 12:40 PM, Josef Bacik <jbacik@fusionio.com> wrote: > > On Thu, Aug 08, 2013 at 09:13:04AM -0700, John Williams wrote: > >> Phoronix periodically runs benchmarks on filesystems, and one thing I > >> have noticed is that btrfs always does terribly on their fio "Intel > >> IOMeter fileserver access pattern" benchmark: > >> > >> http://www.phoronix.com/scan.php?page=article&item=linux_310_10fs&num=2 > > > So the reason this workload sucks for btrfs is because we fall back on buffered > > IO because fio does not do block size aligned writes for this workload. If you > > add > > > > ba=4k > > > > to the iometer fio file then we go the same speed as xfs and ext4. Not a whole > > lot we can do about this since unaligned writes means we have to read in pages > > to cow the block properly, which is why we fall back to buffered. Once we do > > that we end up having a lot of page locking stuff that gets in the way and makes > > us twice as slow. Thanks, > > Thanks for looking into it. > > So I guess the reason that ZFS does well with that workload is that > ZFS is using smaller blocks, maybe just 512B ? >Yeah I''m not sure what ZFS does, but if you are writing over a block and the size/offset isn''t aligned then you''d see similar issues with ZFS since it would have to read+modify+write. It is likely that ZFS just is using a smaller blocksize.> I wonder how common these type of non-4K aligned workloads are. > Apparently, people with such workloads should avoid btrfs, but maybe > these types of workloads are very rare?So most people who use AIO/O_DIRECT have really specific setups which generally can adjust how they align stuff (databases for example this would be the db page and those are usually large, like 16k-32k), or with virtual images which will hopefully be doing things in block aligned io''s, but this depends on the host OS. Like I said there isn''t a whole lot we can do about this, you can do NOCOW if you want to get around it without changing your application or you can change the app to be blocksize aligned. Thanks, Josef -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Aug 8, 2013, at 2:23 PM, John Williams <jwilliams4200@gmail.com> wrote:> > So I guess the reason that ZFS does well with that workload is that > ZFS is using smaller blocks, maybe just 512B ?Likely. It uses a variable block size.> I wonder how common these type of non-4K aligned workloads are. > Apparently, people with such workloads should avoid btrfs, but maybe > these types of workloads are very rare?I can''t directly answer the question, but all of the typical file systems on OS X, Linux, and Windows default to 4KB block sizes for many years now, baked in at creation time. On OS X, the block size varies automatically with respect to volume size at fs creation time (it goes to 8KB block sizes above 2TB, and scales up to 1MB block sizes), but still isn''t ever less than 4KB unless manually created this way. So I''d think such workloads are rare. I also don''t know if any common use fs has an optimization whereby just the modified sector(s) is overwritten, rather than all sectors making up the file system block being modified. Chris Murphy-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> I also don''t know if any common use fs has an optimization whereby > just the modified sector(s) is overwritten, rather than all sectors > making up the file system block being modified.Most of them do. The generic direct io path allows sector sized dio. The very first bit of do_blockdev_direct_IO() is testing first for file system block size alignment then for block device sector size alignment. You can see this easily with dd conv=notrunc oflags=direct and blktrace. # blockdev --getss /dev/sda 512 # blockdev --getbsz /dev/sda 4096 # blktrace -d /dev/sda -a issue -o - | blkparse -i - & $ dd if=/dev/zero of=file bs=4096 count=1 oflag=direct conv=notrunc 8,0 3 14 35.957320002 17941 D WS 137297704 + 8 [dd] $ dd if=/dev/zero of=file bs=512 count=1 oflag=direct conv=notrunc 8,0 1 4 31.405641362 17940 D WS 137297704 + 1 [dd] - z -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Josef Bacik <jbacik@fusionio.com> schrieb:>> So I guess the reason that ZFS does well with that workload is that >> ZFS is using smaller blocks, maybe just 512B ? > > Yeah I''m not sure what ZFS does, but if you are writing over a block and > the size/offset isn''t aligned then you''d see similar issues with ZFS since > it would > have to read+modify+write. It is likely that ZFS just is using a smaller > blocksize.From what I remember, ZFS uses dynamic block sizes. However, block size can be forced and thus tuned for workloads that require it: http://www.joyent.com/blog/bruning-questions-zfs-record-size Maybe that''s the reason... It would be interesting to see how the benchmarks performed with forced block size. Regards, Kai -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Aug 09, 2013 at 11:35:33PM +0200, Kai Krakow wrote:> Josef Bacik <jbacik@fusionio.com> schrieb: > > >> So I guess the reason that ZFS does well with that workload is that > >> ZFS is using smaller blocks, maybe just 512B ? > > > > Yeah I''m not sure what ZFS does, but if you are writing over a block and > > the size/offset isn''t aligned then you''d see similar issues with ZFS since > > it would > > have to read+modify+write. It is likely that ZFS just is using a smaller > > blocksize. > > From what I remember, ZFS uses dynamic block sizes. However, block size can > be forced and thus tuned for workloads that require it: > > http://www.joyent.com/blog/bruning-questions-zfs-record-size > > Maybe that''s the reason... > > It would be interesting to see how the benchmarks performed with forced > block size. >When I did bs=4k in the fio job to force it to use 4k blocksizes we performed the same as ext4 and xfs. Thanks, Josef -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html