Adi Kriegisch
2011-Feb-25 16:43 UTC
[Xen-devel] blk[front|back] does not hand over disk parameters
Dear all, (following the XenFAQ on how to report a bug[1], I submitted this to xen-user list[2] first, reported the bug in bugzilla[3] and now resend the text to this list. Please CC me in replys as I am not subscribed to this list, Thanks!) I investigated some serious performance drop between Dom0 and DomU with LVM on top of RAID6 and blkback devices. While I have around 130MB/s write performance in Dom0, I only get 30MB/s in DomU. Inspecting this with dstat/iostat revealed that I have a read rate of about 17-25MB/s while writing with aroung 40MB/s. The reading only occurs on the disk devices assembled to the RAID6 not the md device itself. So this is related to RAID6 activity only. The reason for this is recalculation of checksums due to a too small optimal_io_size: On Dom0: blockdev --getiomin /dev/space/test 524288 (which is chunk size) blockdev --getioopt /dev/space/test 3145728 (which is 6*chunk size) On DomU: blockdev --getiomin /dev/xvdb1 512 blockdev --getioopt /dev/xvdb1 0 (so the kernel will use 1MB by default, IIRC) minimum_io_size -- if not set -- is hardware block size which seems to be set to 512 in xlvbd_init_blk_queue (blkfront.c). Btw: blockdev --getbsz /dev/space/test gives 4096 on Dom0 while DomU reports 512. I can somehow mitigate the issue by using a way smaller chunk size but this is IMHO just working around the issue. Another workaround could be to use a "power-of-two" number of data disks in the raid and choose the chunk size to sum up to 1MB. But this is just another hack... If there is anything I can do, please let me know! Thanks, Adi Kriegisch PS: I am using a stock Debian/Squeeze kernel on top of Debians Xen 4.0.1-2. [1] http://wiki.xensource.com/xenwiki/XenFaq [2] http://lists.xensource.com/archives/html/xen-users/2011-02/msg00615.html [3] http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1745 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Campbell
2011-Feb-28 10:06 UTC
Re: [Xen-devel] blk[front|back] does not hand over disk parameters
On Fri, 2011-02-25 at 16:43 +0000, Adi Kriegisch wrote:> Dear all, > > (following the XenFAQ on how to report a bug[1], I submitted this to > xen-user list[2] first, reported the bug in bugzilla[3] and now resend the > text to this list. Please CC me in replys as I am not subscribed to this > list, Thanks!) > > I investigated some serious performance drop between Dom0 and DomU with > LVM on top of RAID6 and blkback devices. > While I have around 130MB/s write performance in Dom0, I only get 30MB/s in > DomU. Inspecting this with dstat/iostat revealed that I have a read rate of > about 17-25MB/s while writing with aroung 40MB/s. > The reading only occurs on the disk devices assembled to the RAID6 not the > md device itself. So this is related to RAID6 activity only. > The reason for this is recalculation of checksums due to a too small > optimal_io_size: > On Dom0: > blockdev --getiomin /dev/space/test > 524288 (which is chunk size) > blockdev --getioopt /dev/space/test > 3145728 (which is 6*chunk size) > > On DomU: > blockdev --getiomin /dev/xvdb1 > 512 > blockdev --getioopt /dev/xvdb1 > 0 (so the kernel will use 1MB by default, IIRC) > > minimum_io_size -- if not set -- is hardware block size which seems to be > set to 512 in xlvbd_init_blk_queue (blkfront.c). Btw: blockdev --getbsz > /dev/space/test gives 4096 on Dom0 while DomU reports 512. > > I can somehow mitigate the issue by using a way smaller chunk size but this > is IMHO just working around the issue. Another workaround could be to use a > "power-of-two" number of data disks in the raid and choose the chunk size > to sum up to 1MB. But this is just another hack... > > If there is anything I can do, please let me know!This is not the sort of thing which changes dynamically across the lifetime of a device, is it? In which case it seems like the sort of information which the backend could communicate to the frontend via xenbus at start of day. e.g. take a look at how the sector-size is passed through xenbus. It should be trivial to add this in a compatible manner since the frontend can just do what it does today if the nodes are missing and the backend wouldn''t rely on the frontend doing anything useful with the information anyway. Can you make a patch? Ian.> > Thanks, > Adi Kriegisch > > PS: I am using a stock Debian/Squeeze kernel on top of Debians Xen 4.0.1-2. > > [1] http://wiki.xensource.com/xenwiki/XenFaq > [2] http://lists.xensource.com/archives/html/xen-users/2011-02/msg00615.html > [3] http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1745 > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jan Beulich
2011-Feb-28 10:55 UTC
Re: [Xen-devel] blk[front|back] does not hand over disk parameters
>>> On 28.02.11 at 11:06, Ian Campbell <Ian.Campbell@citrix.com> wrote: > This is not the sort of thing which changes dynamically across the > lifetime of a device, is it? In which case it seems like the sort of > information which the backend could communicate to the frontend via > xenbus at start of day. e.g. take a look at how the sector-size is > passed through xenbus. > > It should be trivial to add this in a compatible manner since the > frontend can just do what it does today if the nodes are missing and the > backend wouldn''t rely on the frontend doing anything useful with the > information anyway.Am I right in understanding that these numbers aren''t used by the block layer itself at all, but just get provided to userspace for whatever optimization it can do? In that case, I can''t really see how passing through these values can really help general performance (i.e. for apps not paying attention to these values). Confused, Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Campbell
2011-Feb-28 11:13 UTC
Re: [Xen-devel] blk[front|back] does not hand over disk parameters
On Mon, 2011-02-28 at 10:55 +0000, Jan Beulich wrote:> >>> On 28.02.11 at 11:06, Ian Campbell <Ian.Campbell@citrix.com> wrote: > > This is not the sort of thing which changes dynamically across the > > lifetime of a device, is it? In which case it seems like the sort of > > information which the backend could communicate to the frontend via > > xenbus at start of day. e.g. take a look at how the sector-size is > > passed through xenbus. > > > > It should be trivial to add this in a compatible manner since the > > frontend can just do what it does today if the nodes are missing and the > > backend wouldn''t rely on the frontend doing anything useful with the > > information anyway. > > Am I right in understanding that these numbers aren''t used by > the block layer itself at all, but just get provided to userspace for > whatever optimization it can do? > > Confused, JanI had inferred from Adi''s bringing them up that the kernel would actually use them in some way, but I don''t actually know if that''s the case...> In that case, I can''t really see > how passing through these values can really help general > performance (i.e. for apps not paying attention to these values).Even their utility is only if userspace explicitly makes use of them they are just as useful in a Xen domU as they are in a non-Xen system, so why would we not plumb them through? Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Adi Kriegisch
2011-Feb-28 11:54 UTC
Re: [Xen-devel] blk[front|back] does not hand over disk parameters
On Mon, Feb 28, 2011 at 10:55:12AM +0000, Jan Beulich wrote:> >>> On 28.02.11 at 11:06, Ian Campbell <Ian.Campbell@citrix.com> wrote:[SNIP]> > It should be trivial to add this in a compatible manner since the > > frontend can just do what it does today if the nodes are missing and the > > backend wouldn''t rely on the frontend doing anything useful with the > > information anyway. > > Am I right in understanding that these numbers aren''t used by > the block layer itself at all, but just get provided to userspace for > whatever optimization it can do? In that case, I can''t really see > how passing through these values can really help general > performance (i.e. for apps not paying attention to these values).AFAIK these values are used by mkfs.* in userspace and by the I/O Schedulers in kernel space to optimize performance. There has been some discussions about that on the kernel mailing lists[1] and there is an interesting document about that available from Mike Snitzer[2]. Those values are important for 4K block size drives, for SSDs and -- as in my case -- for RAID levels with checksums. A quick test with a samba server installed in Dom0 revealed that those values do not need to be honoured by Samba to get full write speed. I/O scheduler seems to be the one that needs those values. -- Adi [1] http://marc.info/?l=linux-ide&m=124058535512850&w=4 [2] http://people.redhat.com/msnitzer/docs/io-limits.txt _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jan Beulich
2011-Feb-28 12:51 UTC
Re: [Xen-devel] blk[front|back] does not hand over disk parameters
>>> On 28.02.11 at 12:54, Adi Kriegisch <adi@cg.tuwien.ac.at> wrote: > On Mon, Feb 28, 2011 at 10:55:12AM +0000, Jan Beulich wrote: >> >>> On 28.02.11 at 11:06, Ian Campbell <Ian.Campbell@citrix.com> wrote: > [SNIP] >> > It should be trivial to add this in a compatible manner since the >> > frontend can just do what it does today if the nodes are missing and the >> > backend wouldn''t rely on the frontend doing anything useful with the >> > information anyway. >> >> Am I right in understanding that these numbers aren''t used by >> the block layer itself at all, but just get provided to userspace for >> whatever optimization it can do? In that case, I can''t really see >> how passing through these values can really help general >> performance (i.e. for apps not paying attention to these values). > AFAIK these values are used by mkfs.* in userspace and by the I/O Schedulers > in kernel space to optimize performance. There has been some discussionsI grepped for io_min and a couple of derived variables (like alignment_offset) but couldn''t spot any I/O-relevant readers under block/.> about > that on the kernel mailing lists[1] and there is an interesting document > about > that available from Mike Snitzer[2].I''ll take a look at those. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jan Beulich
2011-Feb-28 12:54 UTC
Re: [Xen-devel] blk[front|back] does not hand over disk parameters
>>> On 28.02.11 at 12:13, Ian Campbell <Ian.Campbell@eu.citrix.com> wrote: > On Mon, 2011-02-28 at 10:55 +0000, Jan Beulich wrote: >> >>> On 28.02.11 at 11:06, Ian Campbell <Ian.Campbell@citrix.com> wrote: >> > This is not the sort of thing which changes dynamically across the >> > lifetime of a device, is it? In which case it seems like the sort of >> > information which the backend could communicate to the frontend via >> > xenbus at start of day. e.g. take a look at how the sector-size is >> > passed through xenbus. >> > >> > It should be trivial to add this in a compatible manner since the >> > frontend can just do what it does today if the nodes are missing and the >> > backend wouldn''t rely on the frontend doing anything useful with the >> > information anyway. >> >> Am I right in understanding that these numbers aren''t used by >> the block layer itself at all, but just get provided to userspace for >> whatever optimization it can do? >> >> Confused, Jan > > I had inferred from Adi''s bringing them up that the kernel would > actually use them in some way, but I don''t actually know if that''s the > case... > >> In that case, I can''t really see >> how passing through these values can really help general >> performance (i.e. for apps not paying attention to these values). > > Even their utility is only if userspace explicitly makes use of them > they are just as useful in a Xen domU as they are in a non-Xen system, > so why would we not plumb them through?Plumbing through can be easily done, indeed, but the question is if that gets us any performance improvement. If these are only for allowing user mode optimizations, then maybe. If they''re to control kernel behavior, then the 11-pages-per-request limitation of the blkif protocol would likely make this exercise pretty pointless I would think. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel