Duan, Ronghui
2012-Aug-16 10:22 UTC
[RFC v1 0/5] VBD: enlarge max segment per request in blkfront
Hi, list. The max segments for request in VBD queue is 11, while for Linux OS/ other VMM, the parameter is set to 128 in default. This may be caused by the limited size of ring between Front/Back. So I guess whether we can put segment data into another ring and dynamic use them for the single request''s need. Here is prototype which don''t do much test, but it can work on Linux 64 bits 3.4.6 kernel. I can see the CPU% can be reduced to 1/3 compared to original in sequential test. But it bring some overhead which will make random IO''s cpu utilization increase a little. Here is a short version data use only 1K random read and 64K sequential read in direct mode. Testing a physical SSD disk as blkback in backend. CPU% is got form xentop. Read 1K random IOPS Dom0 CPU DomU CPU% W 52005.9 86.6 71 W/O 52123.1 85.8 66.9 Read 64K seq BW MB/s Dom0 CPU DomU CPU% W 250 27.1 10.6 W/O 250 62.6 31.1 The patch will be simple if we only use new methods. But we need consider that user may use new kernel as backend while an older one as frontend. Also need considerate live migration case. So the change become huge... [RFC v1 1/5] In order to add new segment ring, refactoring the original code, split some methods related with ring operation. [RFC v1 2/5] Add the segment ring support in blkfront. Most of code is about suspend/recover. [RFC v1 3/5] As the same, need refractor the original code in blkback. [RFC v1 4/5] In order to support different type of ring type in blkback, make the pending_req list per disk. [RFC v1 5/5] Add the segment ring support in blkback. -ronghui
Jan Beulich
2012-Aug-16 11:14 UTC
Re: [RFC v1 0/5] VBD: enlarge max segment per request in blkfront
>>> On 16.08.12 at 12:22, "Duan, Ronghui" <ronghui.duan@intel.com> wrote: > The max segments for request in VBD queue is 11, while for Linux OS/ other > VMM, the parameter is set to 128 in default. This may be caused by the > limited size of ring between Front/Back. So I guess whether we can put > segment data into another ring and dynamic use them for the single request''s > need. Here is prototype which don''t do much test, but it can work on Linux 64 > bits 3.4.6 kernel. I can see the CPU% can be reduced to 1/3 compared to > original in sequential test. But it bring some overhead which will make > random IO''s cpu utilization increase a little.In what way to improve blkback is intended to be subject of a discussion on the summit - are you by any chance going to be there? Fact is that there are a number of other extensions to the interface, and since you don''t mention those I''m assuming you were not aware of them. Jan
Konrad Rzeszutek Wilk
2012-Aug-16 13:34 UTC
Re: [RFC v1 0/5] VBD: enlarge max segment per request in blkfront
On Thu, Aug 16, 2012 at 10:22:56AM +0000, Duan, Ronghui wrote:> Hi, list. > The max segments for request in VBD queue is 11, while for Linux OS/ other VMM, the parameter is set to 128 in default.Like the FreeBSD one?> This may be caused by the limited size of ring between Front/Back. So I guess whether we can put segment data into another ring and dynamic use them for the single request''s need. Here is prototype which don''t do much test, but it can work on Linux 64 bits 3.4.6 kernel. I can see the CPU% can be reduced to 1/3 compared to original in sequential test. But it bring some overhead which will make random IO''s cpu utilization increase a little. >Did you think also about expanding the ring size to something bigger?> Here is a short version data use only 1K random read and 64K sequential read in direct mode. Testing a physical SSD disk as blkback in backend. CPU% is got form xentop.> Read 1K random IOPS Dom0 CPU DomU CPU% > W 52005.9 86.6 71 > W/O 52123.1 85.8 66.9 > > Read 64K seq BW MB/s Dom0 CPU DomU CPU% > W 250 27.1 10.6 > W/O 250 62.6 31.1 > > > The patch will be simple if we only use new methods. But we need consider that user may use new kernel as backend while an older one as frontend. Also need considerate live migration case. So the change become huge...OK? I think you are implementing the extension documented in changeset: 24875:a59c1dcfe968 user: Justin T. Gibbs <justing@spectralogic.com> date: Thu Feb 23 10:03:07 2012 +0000 summary: blkif.h: Define and document the request number/size/segments extension changeset: 24874:f9789db96c39 user: Justin T. Gibbs <justing@spectralogic.com> date: Thu Feb 23 10:02:30 2012 +0000 summary: blkif.h: Document the Red Hat and Citrix blkif multi-page ring extensions so that would be the max-requests-segments one?> [RFC v1 1/5] > In order to add new segment ring, refactoring the original code, split some methods related with ring operation. > [RFC v1 2/5] > Add the segment ring support in blkfront. Most of code is about suspend/recover. > [RFC v1 3/5] > As the same, need refractor the original code in blkback. > [RFC v1 4/5] > In order to support different type of ring type in blkback, make the pending_req list per disk.Not sure why you structured the patches like this way, but it might make sense to order them in 1, 3, 4, 2, 5 order. The ''pending_req''/per disk is an overall improvement that fixes a lot of concurrent issues. I tried to implement this and ran in an issue with grants still being active? Did you have issues with that or it worked just fine for you?> [RFC v1 5/5] > Add the segment ring support in blkback.So .. where are the patches? Did I miss them?> -ronghui > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel
Konrad Rzeszutek Wilk
2012-Aug-16 13:55 UTC
Re: [RFC v1 0/5] VBD: enlarge max segment per request in blkfront
On Thu, Aug 16, 2012 at 09:34:57AM -0400, Konrad Rzeszutek Wilk wrote:> On Thu, Aug 16, 2012 at 10:22:56AM +0000, Duan, Ronghui wrote: > > Hi, list. > > The max segments for request in VBD queue is 11, while for Linux OS/ other VMM, the parameter is set to 128 in default. > > Like the FreeBSD one? > > > This may be caused by the limited size of ring between Front/Back. So I guess whether we can put segment data into another ring and dynamic use them for the single request''s need. Here is prototype which don''t do much test, but it can work on Linux 64 bits 3.4.6 kernel. I can see the CPU% can be reduced to 1/3 compared to original in sequential test. But it bring some overhead which will make random IO''s cpu utilization increase a little. > > > > Did you think also about expanding the ring size to something bigger? > > > Here is a short version data use only 1K random read and 64K sequential read in direct mode. Testing a physical SSD disk as blkback in backend. CPU% is got form xentop. > > > Read 1K random IOPS Dom0 CPU DomU CPU% > > W 52005.9 86.6 71 > > W/O 52123.1 85.8 66.9 > > > > Read 64K seq BW MB/s Dom0 CPU DomU CPU% > > W 250 27.1 10.6 > > W/O 250 62.6 31.1 > > > > > > The patch will be simple if we only use new methods. But we need consider that user may use new kernel as backend while an older one as frontend. Also need considerate live migration case. So the change become huge... > > OK? I think you are implementing the extension documented in > > changeset: 24875:a59c1dcfe968 > user: Justin T. Gibbs <justing@spectralogic.com> > date: Thu Feb 23 10:03:07 2012 +0000 > summary: blkif.h: Define and document the request number/size/segments extension > > changeset: 24874:f9789db96c39 > user: Justin T. Gibbs <justing@spectralogic.com> > date: Thu Feb 23 10:02:30 2012 +0000 > summary: blkif.h: Document the Red Hat and Citrix blkif multi-page ring extensions > > so that would be the max-requests-segments one? > > > > > [RFC v1 1/5] > > In order to add new segment ring, refactoring the original code, split some methods related with ring operation. > > [RFC v1 2/5] > > Add the segment ring support in blkfront. Most of code is about suspend/recover. > > [RFC v1 3/5] > > As the same, need refractor the original code in blkback. > > [RFC v1 4/5] > > In order to support different type of ring type in blkback, make the pending_req list per disk. > > Not sure why you structured the patches like this way, but it might > make sense to order them in 1, 3, 4, 2, 5 order. The ''pending_req''/per disk is an overall > improvement that fixes a lot of concurrent issues. I tried to implement this and ran > in an issue with grants still being active? Did you have issues with that or it worked just fine > for you? > > [RFC v1 5/5] > > Add the segment ring support in blkback. > > So .. where are the patches? Did I miss them?Ah, they just arrived. I took a brief look at them, and I think they are the right step. The things that are missing is that that you are missing the kfree in 4/5 when the disk is gone away. Also there are some code that is commented out and its not clear to me why that is. Lastly, this protocol should be negotiated using the ''max-request-.. '' or whichever is the proper type, not the blkfront-ring-type. It also would be good to CC Justin as he might have some guidance in this and also could test the frontend on his backend (or vice-versa). Not sure what is involved in setting up a FreeBSD backend that spectralogic is using.. Thought this might also involed expanding the ring to be a multi-page one I think? And I wonder if you need to have such a huge list of ops? Can some of them be trimmed down? They v1 and v2 look quite similar. Oh, and instead of v1 and v2 I would just call them ''large_segment'' and ''default_segment''. Or ''lgr_segment'' and ''def_segment'' perhaps? Maybe ''huge_segment'' and ''generic_segment'' that sounds better. Lastly, its not clear to me why you are removing the padding on some of the older blkif structures? Thanks for posting this!> > -ronghui > > > > > > > > _______________________________________________ > > Xen-devel mailing list > > Xen-devel@lists.xen.org > > http://lists.xen.org/xen-devel
Jan Beulich
2012-Aug-16 14:18 UTC
Re: [RFC v1 0/5] VBD: enlarge max segment per request in blkfront
>>> On 16.08.12 at 15:34, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote: > On Thu, Aug 16, 2012 at 10:22:56AM +0000, Duan, Ronghui wrote: >> This may be caused by the limited size of ring between Front/Back. So I > guess whether we can put segment data into another ring and dynamic use them > for the single request''s need. Here is prototype which don''t do much test, > but it can work on Linux 64 bits 3.4.6 kernel. I can see the CPU% can be > reduced to 1/3 compared to original in sequential test. But it bring some > overhead which will make random IO''s cpu utilization increase a little. >> > > Did you think also about expanding the ring size to something bigger?If there''s a separate ring for the segment data, then there''s room for quite a bit more entries in the request ring page, so I don''t see an immediate need to also increase that one. Jan
Duan, Ronghui
2012-Aug-17 01:12 UTC
Re: [RFC v1 0/5] VBD: enlarge max segment per request in blkfront
> >>> On 16.08.12 at 12:22, "Duan, Ronghui" <ronghui.duan@intel.com> wrote: > > The max segments for request in VBD queue is 11, while for Linux OS/ > > other VMM, the parameter is set to 128 in default. This may be caused > > by the limited size of ring between Front/Back. So I guess whether we > > can put segment data into another ring and dynamic use them for the > > single request''s need. Here is prototype which don''t do much test, but > > it can work on Linux 64 bits 3.4.6 kernel. I can see the CPU% can be > > reduced to 1/3 compared to original in sequential test. But it bring > > some overhead which will make random IO''s cpu utilization increase a little. > > In what way to improve blkback is intended to be subject of a discussion on the > summit - are you by any chance going to be there? Fact is that there are a > number of other extensions to the interface, and since you don''t mention those > I''m assuming you were not aware of them. >So pity that I can''t be there. Indeed I did not see the other extension before. I begin this from seeing the larger overhead in CPU in sequential IO. Then I try multiple ring and get a lot of help from Stefano. Then I have an idea adding a ring as Segment separate. Stefano suggest to send here and listen for advice.
Duan, Ronghui
2012-Aug-17 01:26 UTC
Re: [RFC v1 0/5] VBD: enlarge max segment per request in blkfront
> On Thu, Aug 16, 2012 at 09:34:57AM -0400, Konrad Rzeszutek Wilk wrote: > > On Thu, Aug 16, 2012 at 10:22:56AM +0000, Duan, Ronghui wrote: > > > Hi, list. > > > The max segments for request in VBD queue is 11, while for Linux OS/ other > VMM, the parameter is set to 128 in default. > > > > Like the FreeBSD one? > >Yeap.> > > This may be caused by the limited size of ring between Front/Back. So I guess > whether we can put segment data into another ring and dynamic use them for > the single request''s need. Here is prototype which don''t do much test, but it can > work on Linux 64 bits 3.4.6 kernel. I can see the CPU% can be reduced to 1/3 > compared to original in sequential test. But it bring some overhead which will > make random IO''s cpu utilization increase a little. > > > > > > > Did you think also about expanding the ring size to something bigger? > >A separate ring will hold 1024 segments, I think it can feed most of H/W''s BW.> > > Here is a short version data use only 1K random read and 64K sequential read > in direct mode. Testing a physical SSD disk as blkback in backend. CPU% is got > form xentop. > > > > > Read 1K random IOPS Dom0 CPU DomU CPU% > > > W 52005.9 86.6 71 > > > W/O 52123.1 85.8 66.9 > > > > > > Read 64K seq BW MB/s Dom0 CPU DomU CPU% > > > W 250 27.1 10.6 > > > W/O 250 62.6 31.1 > > > > > > > > > The patch will be simple if we only use new methods. But we need consider > that user may use new kernel as backend while an older one as frontend. Also > need considerate live migration case. So the change become huge... > > > > OK? I think you are implementing the extension documented in > > > > changeset: 24875:a59c1dcfe968 > > user: Justin T. Gibbs <justing@spectralogic.com> > > date: Thu Feb 23 10:03:07 2012 +0000 > > summary: blkif.h: Define and document the request > number/size/segments extension > > > > changeset: 24874:f9789db96c39 > > user: Justin T. Gibbs <justing@spectralogic.com> > > date: Thu Feb 23 10:02:30 2012 +0000 > > summary: blkif.h: Document the Red Hat and Citrix blkif multi-page ring > extensions > > > > so that would be the max-requests-segments one?Oh, I miss this info. But sure I increase the max-request-segments.> > > > > > > [RFC v1 1/5] > > > In order to add new segment ring, refactoring the original code, split some > methods related with ring operation. > > > [RFC v1 2/5] > > > Add the segment ring support in blkfront. Most of code is about > suspend/recover. > > > [RFC v1 3/5] > > > As the same, need refractor the original code in blkback. > > > [RFC v1 4/5] > > > In order to support different type of ring type in blkback, make the > pending_req list per disk. > > > > Not sure why you structured the patches like this way, but it might > > make sense to order them in 1, 3, 4, 2, 5 order. The ''pending_req''/per > > disk is an overall improvement that fixes a lot of concurrent issues. > > I tried to implement this and ran in an issue with grants still being > > active? Did you have issues with that or it worked just fine for you? > > > [RFC v1 5/5] > > > Add the segment ring support in blkback. > > > > So .. where are the patches? Did I miss them? > > Ah, they just arrived. > > I took a brief look at them, and I think they are the right step. The things that are > missing is that that you are missing the kfree in 4/5 when the disk is gone away. > Also there are some code that is commented out and its not clear to me why that > is.I forget clean this. I want to listen for advise for the change for the protocol. I can Send out a ''patch'' after.> Lastly, this protocol should be negotiated using the ''max-request-.. '' or whichever > is the proper type, not the blkfront-ring-type. It also would be good to CC Justin > as he might have some guidance in this and also could test the frontend on his > backend (or vice-versa). Not sure what is involved in setting up a FreeBSD > backend that spectralogic is using.. Thought this might also involed expanding the > ring to be a multi-page one I think? >I begin with this from multi-page ring, I also have a patch about multi-page ring for This patch. But due to no much positive influence on performance then I drop it.> And I wonder if you need to have such a huge list of ops? Can some of them be > trimmed down?Yes, it can be less ops adding a common structure like backend.> They v1 and v2 look quite similar. Oh, and instead of v1 and v2 I would just call > them ''large_segment'' and ''default_segment''. Or ''lgr_segment'' and > ''def_segment'' perhaps? > > Maybe ''huge_segment'' and ''generic_segment'' that sounds better. >Good for me, I will reconsider the suitable parameter.> Lastly, its not clear to me why you are removing the padding on some of the older > blkif structures?It will cause some miss match size between 64bit Domu on 32bits Dom0, So remove it. I will double check there alignment after.> Thanks for posting this! > > > -ronghui > > > > > > > > > > > > _______________________________________________ > > > Xen-devel mailing list > > > Xen-devel@lists.xen.org > > > http://lists.xen.org/xen-devel
Konrad Rzeszutek Wilk
2012-Sep-07 17:49 UTC
Re: [RFC v1 0/5] VBD: enlarge max segment per request in blkfront
On Thu, Aug 16, 2012 at 10:22:56AM +0000, Duan, Ronghui wrote:> Hi, list. > The max segments for request in VBD queue is 11, while for Linux OS/ other VMM, the parameter is set to 128 in default. This may be caused by the limited size of ring between Front/Back. So I guess whether we can put segment data into another ring and dynamic use them for the single request''s need. Here is prototype which don''t do much test, but it can work on Linux 64 bits 3.4.6 kernel. I can see the CPU% can be reduced to 1/3 compared to original in sequential test. But it bring some overhead which will make random IO''s cpu utilization increase a little. > > Here is a short version data use only 1K random read and 64K sequential read in direct mode. Testing a physical SSD disk as blkback in backend. CPU% is got form xentop. > Read 1K random IOPS Dom0 CPU DomU CPU% > W 52005.9 86.6 71 > W/O 52123.1 85.8 66.9So I am getting some different numbers. I tried a simple 4K read: [/dev/xvda1] bssplit=4K rw=read direct=1 size=4g ioengine=libaio iodepth=64 And with your patch got: read : io=4096.0MB, bw=92606KB/s, iops=23151 , runt= 45292msec without: read : io=4096.0MB, bw=145187KB/s, iops=36296 , runt= 28889msec> > Read 64K seq BW MB/s Dom0 CPU DomU CPU% > W 250 27.1 10.6 > W/O 250 62.6 31.1Hadn''t tried that yet.
Duan, Ronghui
2012-Sep-13 02:28 UTC
Re: [RFC v1 0/5] VBD: enlarge max segment per request in blkfront
> form xentop. > > Read 1K random IOPS Dom0 CPU DomU CPU% > > W 52005.9 86.6 71 > > W/O 52123.1 85.8 66.9 > > So I am getting some different numbers. I tried a simple 4K read: > > [/dev/xvda1] > bssplit=4K > rw=read > direct=1 > size=4g > ioengine=libaio > iodepth=64 > > And with your patch got: > read : io=4096.0MB, bw=92606KB/s, iops=23151 , runt= 45292msec > > without: > read : io=4096.0MB, bw=145187KB/s, iops=36296 , runt= 28889msec >What type of backend file you are using? In order to remove the influence of cache in Dom0, I use a physical partition as backend. In code level, although there is some overhead in my patch compared to original, but it should not to be so big. :) -ronghui> -----Original Message----- > From: Konrad Rzeszutek Wilk [mailto:konrad.wilk@oracle.com] > Sent: Saturday, September 08, 2012 1:49 AM > To: Duan, Ronghui > Cc: xen-devel@lists.xen.org; Ian.Jackson@eu.citrix.com; > Stefano.Stabellini@eu.citrix.com > Subject: Re: [Xen-devel] [RFC v1 0/5] VBD: enlarge max segment per request in > blkfront > > On Thu, Aug 16, 2012 at 10:22:56AM +0000, Duan, Ronghui wrote: > > Hi, list. > > The max segments for request in VBD queue is 11, while for Linux OS/ other > VMM, the parameter is set to 128 in default. This may be caused by the limited > size of ring between Front/Back. So I guess whether we can put segment data > into another ring and dynamic use them for the single request''s need. Here is > prototype which don''t do much test, but it can work on Linux 64 bits 3.4.6 > kernel. I can see the CPU% can be reduced to 1/3 compared to original in > sequential test. But it bring some overhead which will make random IO''s cpu > utilization increase a little. > > > > Here is a short version data use only 1K random read and 64K sequential read > in direct mode. Testing a physical SSD disk as blkback in backend. CPU% is got > form xentop. > > Read 1K random IOPS Dom0 CPU DomU CPU% > > W 52005.9 86.6 71 > > W/O 52123.1 85.8 66.9 > > So I am getting some different numbers. I tried a simple 4K read: > > [/dev/xvda1] > bssplit=4K > rw=read > direct=1 > size=4g > ioengine=libaio > iodepth=64 > > And with your patch got: > read : io=4096.0MB, bw=92606KB/s, iops=23151 , runt= 45292msec > > without: > read : io=4096.0MB, bw=145187KB/s, iops=36296 , runt= 28889msec > > > > > > Read 64K seq BW MB/s Dom0 CPU DomU CPU% > > W 250 27.1 10.6 > > W/O 250 62.6 31.1 > > Hadn''t tried that yet.
Jan Beulich
2012-Sep-13 07:32 UTC
Re: [RFC v1 0/5] VBD: enlarge max segment per request in blkfront
>>> On 13.09.12 at 04:28, "Duan, Ronghui" <ronghui.duan@intel.com> wrote: >> And with your patch got: >> read : io=4096.0MB, bw=92606KB/s, iops=23151 , runt= 45292msec >> >> without: >> read : io=4096.0MB, bw=145187KB/s, iops=36296 , runt= 28889msec >> > What type of backend file you are using? In order to remove the influence of > cache in Dom0, I use a physical partition as backend.But you certainly shouldn''t be proposing features getting used unconditionally or by default that benefit one class of backing devices and severely penalize others. Jan
Stefano Stabellini
2012-Sep-13 11:05 UTC
Re: [RFC v1 0/5] VBD: enlarge max segment per request in blkfront
On Thu, 13 Sep 2012, Jan Beulich wrote:> >>> On 13.09.12 at 04:28, "Duan, Ronghui" <ronghui.duan@intel.com> wrote: > >> And with your patch got: > >> read : io=4096.0MB, bw=92606KB/s, iops=23151 , runt= 45292msec > >> > >> without: > >> read : io=4096.0MB, bw=145187KB/s, iops=36296 , runt= 28889msec > >> > > What type of backend file you are using? In order to remove the influence of > > cache in Dom0, I use a physical partition as backend. > > But you certainly shouldn''t be proposing features getting used > unconditionally or by default that benefit one class of backing > devices and severely penalize others.Right. I am wondering.. Considering that the in-kernel blkback is mainly used with physical partitions, is it possible that your patches cause a regression with unmodified backends that don''t support the new protocol, like QEMU for example?
Konrad Rzeszutek Wilk
2012-Sep-13 13:21 UTC
Re: [RFC v1 0/5] VBD: enlarge max segment per request in blkfront
On Thu, Sep 13, 2012 at 02:28:12AM +0000, Duan, Ronghui wrote:> > form xentop. > > > Read 1K random IOPS Dom0 CPU DomU CPU% > > > W 52005.9 86.6 71 > > > W/O 52123.1 85.8 66.9 > > > > So I am getting some different numbers. I tried a simple 4K read: > > > > [/dev/xvda1] > > bssplit=4K > > rw=read > > direct=1 > > size=4g > > ioengine=libaio > > iodepth=64 > > > > And with your patch got: > > read : io=4096.0MB, bw=92606KB/s, iops=23151 , runt= 45292msec > > > > without: > > read : io=4096.0MB, bw=145187KB/s, iops=36296 , runt= 28889msec > > > What type of backend file you are using? In order to remove the influence of cache in Dom0, I use a physical partition as backend. > In code level, although there is some overhead in my patch compared to original, but it should not to be so big. :)Right :-) phy:/dev/sda,xvda,w The sda is a Corsair SSD.
Konrad Rzeszutek Wilk
2012-Sep-13 13:23 UTC
Re: [RFC v1 0/5] VBD: enlarge max segment per request in blkfront
On Thu, Sep 13, 2012 at 12:05:35PM +0100, Stefano Stabellini wrote:> On Thu, 13 Sep 2012, Jan Beulich wrote: > > >>> On 13.09.12 at 04:28, "Duan, Ronghui" <ronghui.duan@intel.com> wrote: > > >> And with your patch got: > > >> read : io=4096.0MB, bw=92606KB/s, iops=23151 , runt= 45292msec > > >> > > >> without: > > >> read : io=4096.0MB, bw=145187KB/s, iops=36296 , runt= 28889msec > > >> > > > What type of backend file you are using? In order to remove the influence of > > > cache in Dom0, I use a physical partition as backend. > > > > But you certainly shouldn''t be proposing features getting used > > unconditionally or by default that benefit one class of backing > > devices and severely penalize others. > > Right. > I am wondering.. Considering that the in-kernel blkback is mainly used > with physical partitions, is it possible that your patches cause a > regression with unmodified backends that don''t support the new protocol, > like QEMU for example?Well for right now I am just using the most simple configuration to eliminate any extra variables (stacking of components). So my "testing" has been just on phy:/dev/sda,xvda,w with the sda being a Corsair SSD.
Duan, Ronghui
2012-Sep-13 14:05 UTC
Re: [RFC v1 0/5] VBD: enlarge max segment per request in blkfront
> > > But you certainly shouldn''t be proposing features getting used > > > unconditionally or by default that benefit one class of backing > > > devices and severely penalize others. > > > > Right. > > I am wondering.. Considering that the in-kernel blkback is mainly used > > with physical partitions, is it possible that your patches cause a > > regression with unmodified backends that don''t support the new > > protocol, like QEMU for example? > > Well for right now I am just using the most simple configuration to eliminate any > extra variables (stacking of components). So my "testing" has been just on > phy:/dev/sda,xvda,w with the sda being a Corsair SSD.I totally agree that we should not break others when enable what we want. But just from my mind, the patch only have a little overhead in the front/backend code path. It will induce pure random IO with a little overhead. I tried the 4K read case, I just got 50MB/s w/o the patch. I need a more powerful disk to verified it. Ronghui> -----Original Message----- > From: Konrad Rzeszutek Wilk [mailto:konrad.wilk@oracle.com] > Sent: Thursday, September 13, 2012 9:24 PM > To: Stefano Stabellini > Cc: Jan Beulich; Duan, Ronghui; Ian Jackson; xen-devel@lists.xen.org > Subject: Re: [Xen-devel] [RFC v1 0/5] VBD: enlarge max segment per request in > blkfront > > On Thu, Sep 13, 2012 at 12:05:35PM +0100, Stefano Stabellini wrote: > > On Thu, 13 Sep 2012, Jan Beulich wrote: > > > >>> On 13.09.12 at 04:28, "Duan, Ronghui" <ronghui.duan@intel.com> wrote: > > > >> And with your patch got: > > > >> read : io=4096.0MB, bw=92606KB/s, iops=23151 , runt= 45292msec > > > >> > > > >> without: > > > >> read : io=4096.0MB, bw=145187KB/s, iops=36296 , runt= 28889msec > > > >> > > > > What type of backend file you are using? In order to remove the > > > > influence of cache in Dom0, I use a physical partition as backend. > > > > > > But you certainly shouldn''t be proposing features getting used > > > unconditionally or by default that benefit one class of backing > > > devices and severely penalize others. > > > > Right. > > I am wondering.. Considering that the in-kernel blkback is mainly used > > with physical partitions, is it possible that your patches cause a > > regression with unmodified backends that don''t support the new > > protocol, like QEMU for example? > > Well for right now I am just using the most simple configuration to eliminate any > extra variables (stacking of components). So my "testing" has been just on > phy:/dev/sda,xvda,w with the sda being a Corsair SSD.
Duan, Ronghui
2012-Sep-17 06:33 UTC
Re: [RFC v1 0/5] VBD: enlarge max segment per request in blkfront
At last, I saw the regression in random io. This is a patch to fix the performance regression. Original the pending request members are allocated from the stack, I alloc them when each request arrives in my last patch. But it will hurt performance. In this fix, I alloc all of them when blkback init. But due to some bugs there, we can''t free it, the same to other pending requests member. I am looking for the reason. But have no idea for this now. Konrad, thanks for your comments. Could you have a try when you have time. -ronghui> -----Original Message----- > From: Duan, Ronghui > Sent: Thursday, September 13, 2012 10:06 PM > To: Konrad Rzeszutek Wilk; Stefano Stabellini > Cc: Jan Beulich; Ian Jackson; xen-devel@lists.xen.org > Subject: RE: [Xen-devel] [RFC v1 0/5] VBD: enlarge max segment per request in > blkfront > > > > > But you certainly shouldn''t be proposing features getting used > > > > unconditionally or by default that benefit one class of backing > > > > devices and severely penalize others. > > > > > > Right. > > > I am wondering.. Considering that the in-kernel blkback is mainly > > > used with physical partitions, is it possible that your patches > > > cause a regression with unmodified backends that don''t support the > > > new protocol, like QEMU for example? > > > > Well for right now I am just using the most simple configuration to > > eliminate any extra variables (stacking of components). So my > > "testing" has been just on phy:/dev/sda,xvda,w with the sda being a Corsair > SSD. > > I totally agree that we should not break others when enable what we want. > But just from my mind, the patch only have a little overhead in the > front/backend code path. It will induce pure random IO with a little overhead. > I tried the 4K read case, I just got 50MB/s w/o the patch. I need a more > powerful disk to verified it. > > Ronghui > > > > -----Original Message----- > > From: Konrad Rzeszutek Wilk [mailto:konrad.wilk@oracle.com] > > Sent: Thursday, September 13, 2012 9:24 PM > > To: Stefano Stabellini > > Cc: Jan Beulich; Duan, Ronghui; Ian Jackson; xen-devel@lists.xen.org > > Subject: Re: [Xen-devel] [RFC v1 0/5] VBD: enlarge max segment per > > request in blkfront > > > > On Thu, Sep 13, 2012 at 12:05:35PM +0100, Stefano Stabellini wrote: > > > On Thu, 13 Sep 2012, Jan Beulich wrote: > > > > >>> On 13.09.12 at 04:28, "Duan, Ronghui" <ronghui.duan@intel.com> wrote: > > > > >> And with your patch got: > > > > >> read : io=4096.0MB, bw=92606KB/s, iops=23151 , runt> > > > >> 45292msec > > > > >> > > > > >> without: > > > > >> read : io=4096.0MB, bw=145187KB/s, iops=36296 , runt> > > > >> 28889msec > > > > >> > > > > > What type of backend file you are using? In order to remove the > > > > > influence of cache in Dom0, I use a physical partition as backend. > > > > > > > > But you certainly shouldn''t be proposing features getting used > > > > unconditionally or by default that benefit one class of backing > > > > devices and severely penalize others. > > > > > > Right. > > > I am wondering.. Considering that the in-kernel blkback is mainly > > > used with physical partitions, is it possible that your patches > > > cause a regression with unmodified backends that don''t support the > > > new protocol, like QEMU for example? > > > > Well for right now I am just using the most simple configuration to > > eliminate any extra variables (stacking of components). So my > > "testing" has been just on phy:/dev/sda,xvda,w with the sda being a Corsair > SSD._______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Konrad Rzeszutek Wilk
2012-Sep-17 14:37 UTC
Re: [RFC v1 0/5] VBD: enlarge max segment per request in blkfront
On Mon, Sep 17, 2012 at 06:33:29AM +0000, Duan, Ronghui wrote:> At last, I saw the regression in random io. > This is a patch to fix the performance regression. Original the pending request members are allocated from the stack, I alloc them when each request arrives in my last patch. But it will hurt performance. In this fix, I alloc all of them when blkback init. But due to some bugs there, we can''t free it, the same to other pending requests member. I am looking for the reason. But have no idea for this now.Right. When I implemented something similar to this (allocate at startup those pools of pages), I had the same problem of freeing the grant array blowing up the machine. But... this was before http://git.kernel.org/?p=linux/kernel/git/konrad/xen.git;a=commit;h=2fc136eecd0c647a6b13fcd00d0c41a1a28f35a5 - which might be the fix for this.> Konrad, thanks for your comments. Could you have a try when you have time. > > -ronghui > > > -----Original Message----- > > From: Duan, Ronghui > > Sent: Thursday, September 13, 2012 10:06 PM > > To: Konrad Rzeszutek Wilk; Stefano Stabellini > > Cc: Jan Beulich; Ian Jackson; xen-devel@lists.xen.org > > Subject: RE: [Xen-devel] [RFC v1 0/5] VBD: enlarge max segment per request in > > blkfront > > > > > > > But you certainly shouldn''t be proposing features getting used > > > > > unconditionally or by default that benefit one class of backing > > > > > devices and severely penalize others. > > > > > > > > Right. > > > > I am wondering.. Considering that the in-kernel blkback is mainly > > > > used with physical partitions, is it possible that your patches > > > > cause a regression with unmodified backends that don''t support the > > > > new protocol, like QEMU for example? > > > > > > Well for right now I am just using the most simple configuration to > > > eliminate any extra variables (stacking of components). So my > > > "testing" has been just on phy:/dev/sda,xvda,w with the sda being a Corsair > > SSD. > > > > I totally agree that we should not break others when enable what we want. > > But just from my mind, the patch only have a little overhead in the > > front/backend code path. It will induce pure random IO with a little overhead. > > I tried the 4K read case, I just got 50MB/s w/o the patch. I need a more > > powerful disk to verified it. > > > > Ronghui > > > > > > > -----Original Message----- > > > From: Konrad Rzeszutek Wilk [mailto:konrad.wilk@oracle.com] > > > Sent: Thursday, September 13, 2012 9:24 PM > > > To: Stefano Stabellini > > > Cc: Jan Beulich; Duan, Ronghui; Ian Jackson; xen-devel@lists.xen.org > > > Subject: Re: [Xen-devel] [RFC v1 0/5] VBD: enlarge max segment per > > > request in blkfront > > > > > > On Thu, Sep 13, 2012 at 12:05:35PM +0100, Stefano Stabellini wrote: > > > > On Thu, 13 Sep 2012, Jan Beulich wrote: > > > > > >>> On 13.09.12 at 04:28, "Duan, Ronghui" <ronghui.duan@intel.com> wrote: > > > > > >> And with your patch got: > > > > > >> read : io=4096.0MB, bw=92606KB/s, iops=23151 , runt> > > > > >> 45292msec > > > > > >> > > > > > >> without: > > > > > >> read : io=4096.0MB, bw=145187KB/s, iops=36296 , runt> > > > > >> 28889msec > > > > > >> > > > > > > What type of backend file you are using? In order to remove the > > > > > > influence of cache in Dom0, I use a physical partition as backend. > > > > > > > > > > But you certainly shouldn''t be proposing features getting used > > > > > unconditionally or by default that benefit one class of backing > > > > > devices and severely penalize others. > > > > > > > > Right. > > > > I am wondering.. Considering that the in-kernel blkback is mainly > > > > used with physical partitions, is it possible that your patches > > > > cause a regression with unmodified backends that don''t support the > > > > new protocol, like QEMU for example? > > > > > > Well for right now I am just using the most simple configuration to > > > eliminate any extra variables (stacking of components). So my > > > "testing" has been just on phy:/dev/sda,xvda,w with the sda being a Corsair > > SSD. >> _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel
Konrad Rzeszutek Wilk
2012-Sep-19 21:11 UTC
Re: [RFC v1 0/5] VBD: enlarge max segment per request in blkfront
On Mon, Sep 17, 2012 at 06:33:29AM +0000, Duan, Ronghui wrote:> At last, I saw the regression in random io. > This is a patch to fix the performance regression. Original the pending request members are allocated from the stack, I alloc them when each request arrives in my last patch. But it will hurt performance. In this fix, I alloc all of them when blkback init. But due to some bugs there, we can''t free it, the same to other pending requests member. I am looking for the reason. But have no idea for this now. > Konrad, thanks for your comments. Could you have a try when you have time.Sure. I get now: read : io=4096.0MB, bw=144258KB/s, iops=36064 , runt= 29075msec so much better I/O.> > > > > >> And with your patch got: > > > > > >> read : io=4096.0MB, bw=92606KB/s, iops=23151 , runt> > > > > >> 45292msec > > > > > >> > > > > > >> without: > > > > > >> read : io=4096.0MB, bw=145187KB/s, iops=36296 , runt> > > > > >> 28889msec