Christian Brunner
2011-Oct-26 08:12 UTC
Re: ceph on btrfs [was Re: ceph on non-btrfs file systems]
2011/10/26 Sage Weil <sage@newdream.net>:> On Wed, 26 Oct 2011, Christian Brunner wrote: >> >> > Christian, have you tweaked those settings in your ceph.conf? It would be >> >> > something like ''journal dio = false''. If not, can you verify that >> >> > directio shows true when the journal is initialized from your osd log? >> >> > E.g., >> >> > >> >> > 2011-10-21 15:21:02.026789 7ff7e5c54720 journal _open dev/osd0.journal fd 14: 104857600 bytes, block size 4096 bytes, directio = 1 >> >> > >> >> > If directio = 1 for you, something else funky is causing those >> >> > blkdev_fsync''s... >> >> >> >> I''ve looked it up in the logs - directio is 1: >> >> >> >> Oct 25 17:20:16 os00 osd.000[1696]: 7f0016841740 journal _open >> >> /dev/vg01/lv_osd_journal_0 fd 15: 17179869184 bytes, block size 4096 >> >> bytes, directio = 1 >> > >> > Do you mind capturing an strace? I''d like to see where that blkdev_fsync >> > is coming from. >> >> Here is an strace. I can see a lot of sync_file_range operations. > > Yeah, these all look like the flusher thread, and shouldn''t be hitting > blkdev_fsync. Can you confirm that with > > filestore flusher = false > filestore sync flush = false > > you get no sync_file_range at all? I wonder if this is also perf lying > about the call chain.Yes, setting this makes the sync_file_range calls go away. Is it safe to use these settings with "filestore btrfs snap = 0"? Thanks, Christian -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Sage Weil
2011-Oct-26 16:32 UTC
Re: ceph on btrfs [was Re: ceph on non-btrfs file systems]
On Wed, 26 Oct 2011, Christian Brunner wrote:> 2011/10/26 Sage Weil <sage@newdream.net>: > > On Wed, 26 Oct 2011, Christian Brunner wrote: > >> >> > Christian, have you tweaked those settings in your ceph.conf? It would be > >> >> > something like ''journal dio = false''. If not, can you verify that > >> >> > directio shows true when the journal is initialized from your osd log? > >> >> > E.g., > >> >> > > >> >> > 2011-10-21 15:21:02.026789 7ff7e5c54720 journal _open dev/osd0.journal fd 14: 104857600 bytes, block size 4096 bytes, directio = 1 > >> >> > > >> >> > If directio = 1 for you, something else funky is causing those > >> >> > blkdev_fsync''s... > >> >> > >> >> I''ve looked it up in the logs - directio is 1: > >> >> > >> >> Oct 25 17:20:16 os00 osd.000[1696]: 7f0016841740 journal _open > >> >> /dev/vg01/lv_osd_journal_0 fd 15: 17179869184 bytes, block size 4096 > >> >> bytes, directio = 1 > >> > > >> > Do you mind capturing an strace? I''d like to see where that blkdev_fsync > >> > is coming from. > >> > >> Here is an strace. I can see a lot of sync_file_range operations. > > > > Yeah, these all look like the flusher thread, and shouldn''t be hitting > > blkdev_fsync. Can you confirm that with > > > > filestore flusher = false > > filestore sync flush = false > > > > you get no sync_file_range at all? I wonder if this is also perf lying > > about the call chain. > > Yes, setting this makes the sync_file_range calls go away.Okay. That means either sync_file_range on a regular btrfs file is triggering blkdev_fsync somewhere in btrfs, there is an extremely sneaky bug that is mixing up file descriptors, or latencytop is lying. I''m guessing the latter, given the other weirdness Josef and Chris were seeing. :)> Is it safe to use these settings with "filestore btrfs snap = 0"?Yeah. They''re purely a performance thing to push as much dirty data to disk as quickly as possible to minimize the snapshot create latency. You''ll notice the write throughput tends to tank when them off. sage