thr3ads.net - Xen devel - [RFC v1 0/5] VBD: enlarge max segment per request in blkfront [Aug 2012]

If this information is useful, please help other people find it:
Share via:

Duan, Ronghui

2012-Aug-16 10:22 UTC

[RFC v1 0/5] VBD: enlarge max segment per request in blkfront

Hi, list.
The max segments for request in VBD queue is 11, while for Linux OS/ other VMM,
the parameter is set to 128 in default. This may be caused by the limited size
of ring between Front/Back. So I guess whether we can put segment data into
another ring and dynamic use them for the single request''s need. Here
is prototype which don''t do much test, but it can work on Linux 64 bits
3.4.6 kernel. I can see the CPU% can be reduced to 1/3 compared to original in
sequential test. But it bring some overhead which will make random IO''s
cpu utilization increase a little.

Here is a short version data use only 1K random read and 64K sequential read in
direct mode. Testing a physical SSD disk as blkback in backend. CPU% is got form
xentop.
Read 1K random	IOPS	   Dom0 CPU	DomU CPU%
		W	52005.9	86.6	71
		W/O	52123.1	85.8	66.9
			
Read 64K seq	BW MB/s	Dom0 CPU	DomU CPU%
	W	250		27.1	       10.6
	W/O	250		62.6	       31.1


The patch will be simple if we only use new methods. But we need consider that
user may use new kernel as backend while an older one as frontend. Also need
considerate live migration case. So the change become huge...
[RFC v1 1/5] 
	In order to add new segment ring, refactoring the original code, split some
methods related with ring operation.
[RFC v1 2/5]
	Add the segment ring support in blkfront. Most of code is about
suspend/recover.
[RFC v1 3/5]
	As the same, need refractor the original code in blkback.
[RFC v1 4/5]
	In order to support different type of ring type in blkback, make the
pending_req list per disk.
[RFC v1 5/5]
	Add the segment ring support in blkback.
-ronghui

Jan Beulich

2012-Aug-16 11:14 UTC

head link

Re: [RFC v1 0/5] VBD: enlarge max segment per request in blkfront

>>> On 16.08.12 at 12:22, "Duan, Ronghui"
<ronghui.duan@intel.com> wrote:
> The max segments for request in VBD queue is 11, while for Linux OS/ other 
> VMM, the parameter is set to 128 in default. This may be caused by the 
> limited size of ring between Front/Back. So I guess whether we can put 
> segment data into another ring and dynamic use them for the single
request''s
> need. Here is prototype which don''t do much test, but it can work
on Linux 64
> bits 3.4.6 kernel. I can see the CPU% can be reduced to 1/3 compared to 
> original in sequential test. But it bring some overhead which will make 
> random IO''s cpu utilization increase a little.
In what way to improve blkback is intended to be subject of a
discussion on the summit - are you by any chance going to be
there? Fact is that there are a number of other extensions to
the interface, and since you don''t mention those I''m assuming
you were not aware of them.

Jan

Konrad Rzeszutek Wilk

2012-Aug-16 13:34 UTC

head link

Re: [RFC v1 0/5] VBD: enlarge max segment per request in blkfront

On Thu, Aug 16, 2012 at 10:22:56AM +0000, Duan, Ronghui
wrote:> Hi, list.
> The max segments for request in VBD queue is 11, while for Linux OS/ other
VMM, the parameter is set to 128 in default.
Like the FreeBSD one?
> This may be caused by the limited size of ring between Front/Back. So I
guess whether we can put segment data into another ring and dynamic use them for
the single request''s need. Here is prototype which don''t do
much test, but it can work on Linux 64 bits 3.4.6 kernel. I can see the CPU% can
be reduced to 1/3 compared to original in sequential test. But it bring some
overhead which will make random IO''s cpu utilization increase a little.
> 
Did you think also about expanding the ring size to something bigger?
> Here is a short version data use only 1K random read and 64K sequential
read in direct mode. Testing a physical SSD disk as blkback in backend. CPU% is
got form xentop.
> Read 1K random	IOPS	   Dom0 CPU	DomU CPU%
> 		W	52005.9	86.6	71
> 		W/O	52123.1	85.8	66.9
> 			
> Read 64K seq	BW MB/s	Dom0 CPU	DomU CPU%
> 	W	250		27.1	       10.6
> 	W/O	250		62.6	       31.1
> 
> 
> The patch will be simple if we only use new methods. But we need consider
that user may use new kernel as backend while an older one as frontend. Also
need considerate live migration case. So the change become huge...
OK? I think you are implementing the extension documented in

changeset:   24875:a59c1dcfe968
user:        Justin T. Gibbs <justing@spectralogic.com>
date:        Thu Feb 23 10:03:07 2012 +0000
summary:     blkif.h: Define and document the request number/size/segments
extension

changeset:   24874:f9789db96c39
user:        Justin T. Gibbs <justing@spectralogic.com>
date:        Thu Feb 23 10:02:30 2012 +0000
summary:     blkif.h: Document the Red Hat and Citrix blkif multi-page ring
extensions

so that would be the max-requests-segments one?


> [RFC v1 1/5] 
> 	In order to add new segment ring, refactoring the original code, split
some methods related with ring operation.
> [RFC v1 2/5]
> 	Add the segment ring support in blkfront. Most of code is about
suspend/recover.
> [RFC v1 3/5]
> 	As the same, need refractor the original code in blkback.
> [RFC v1 4/5]
> 	In order to support different type of ring type in blkback, make the
pending_req list per disk.
Not sure why you structured the patches like this way, but it might
make sense to order them in 1, 3, 4, 2, 5 order. The
''pending_req''/per disk is an overall
improvement that fixes a lot of concurrent issues. I tried to implement this and
ran
in an issue with grants still being active? Did you have issues with that or it
worked just fine
for you?> [RFC v1 5/5]
> 	Add the segment ring support in blkback.
So .. where are the patches? Did I miss them?> -ronghui
> 
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

Konrad Rzeszutek Wilk

2012-Aug-16 13:55 UTC

head link

Re: [RFC v1 0/5] VBD: enlarge max segment per request in blkfront

On Thu, Aug 16, 2012 at 09:34:57AM -0400, Konrad Rzeszutek Wilk
wrote:> On Thu, Aug 16, 2012 at 10:22:56AM +0000, Duan, Ronghui wrote:
> > Hi, list.
> > The max segments for request in VBD queue is 11, while for Linux OS/
other VMM, the parameter is set to 128 in default.
> 
> Like the FreeBSD one?
> 
> > This may be caused by the limited size of ring between Front/Back. So
I guess whether we can put segment data into another ring and dynamic use them
for the single request''s need. Here is prototype which don''t
do much test, but it can work on Linux 64 bits 3.4.6 kernel. I can see the CPU%
can be reduced to 1/3 compared to original in sequential test. But it bring some
overhead which will make random IO''s cpu utilization increase a little.
> > 
> 
> Did you think also about expanding the ring size to something bigger?
> 
> > Here is a short version data use only 1K random read and 64K
sequential read in direct mode. Testing a physical SSD disk as blkback in
backend. CPU% is got form xentop.
> 
> > Read 1K random	IOPS	   Dom0 CPU	DomU CPU%
> > 		W	52005.9	86.6	71
> > 		W/O	52123.1	85.8	66.9
> > 			
> > Read 64K seq	BW MB/s	Dom0 CPU	DomU CPU%
> > 	W	250		27.1	       10.6
> > 	W/O	250		62.6	       31.1
> > 
> > 
> > The patch will be simple if we only use new methods. But we need
consider that user may use new kernel as backend while an older one as frontend.
Also need considerate live migration case. So the change become huge...
> 
> OK? I think you are implementing the extension documented in
> 
> changeset:   24875:a59c1dcfe968
> user:        Justin T. Gibbs <justing@spectralogic.com>
> date:        Thu Feb 23 10:03:07 2012 +0000
> summary:     blkif.h: Define and document the request number/size/segments
extension
> 
> changeset:   24874:f9789db96c39
> user:        Justin T. Gibbs <justing@spectralogic.com>
> date:        Thu Feb 23 10:02:30 2012 +0000
> summary:     blkif.h: Document the Red Hat and Citrix blkif multi-page ring
extensions
> 
> so that would be the max-requests-segments one?
> 
> 
> 
> > [RFC v1 1/5] 
> > 	In order to add new segment ring, refactoring the original code,
split some methods related with ring operation.
> > [RFC v1 2/5]
> > 	Add the segment ring support in blkfront. Most of code is about
suspend/recover.
> > [RFC v1 3/5]
> > 	As the same, need refractor the original code in blkback.
> > [RFC v1 4/5]
> > 	In order to support different type of ring type in blkback, make the
pending_req list per disk.
> 
> Not sure why you structured the patches like this way, but it might
> make sense to order them in 1, 3, 4, 2, 5 order. The
''pending_req''/per disk is an overall
> improvement that fixes a lot of concurrent issues. I tried to implement
this and ran
> in an issue with grants still being active? Did you have issues with that
or it worked just fine
> for you?
> > [RFC v1 5/5]
> > 	Add the segment ring support in blkback.
> 
> So .. where are the patches? Did I miss them?
Ah, they just arrived.

I took a brief look at them, and I think they are the right step. The things
that are
missing is that that you are missing the kfree  in 4/5 when the disk is gone
away. Also
there are some code that is commented out and its not clear to me why that is.

Lastly, this protocol should be negotiated using the ''max-request-..
'' or whichever is
the proper type, not the blkfront-ring-type. It also would be good to CC Justin
as he
might have some guidance in this and also could test the frontend on his backend
(or vice-versa). Not sure what is involved in setting up a FreeBSD backend that
spectralogic
is using.. Thought this might also involed expanding the ring to be a multi-page
one
I think?

And I wonder if you need to have such a huge list of ops? Can some of them be
trimmed down?
They v1 and v2 look quite similar. Oh, and instead of v1 and v2 I would just
call them
''large_segment'' and ''default_segment''. Or
''lgr_segment'' and ''def_segment'' perhaps?

Maybe ''huge_segment'' and ''generic_segment''
that sounds better.

Lastly, its not clear to me why you are removing the padding on some of the
older blkif structures?

Thanks for posting this!> > -ronghui
> > 
> > 
> > 
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@lists.xen.org
> > http://lists.xen.org/xen-devel

Jan Beulich

2012-Aug-16 14:18 UTC

head link

Re: [RFC v1 0/5] VBD: enlarge max segment per request in blkfront

>>> On 16.08.12 at 15:34, Konrad Rzeszutek Wilk
<konrad.wilk@oracle.com> wrote:
> On Thu, Aug 16, 2012 at 10:22:56AM +0000, Duan, Ronghui wrote:
>> This may be caused by the limited size of ring between Front/Back. So I
> guess whether we can put segment data into another ring and dynamic use
them
> for the single request''s need. Here is prototype which
don''t do much test,
> but it can work on Linux 64 bits 3.4.6 kernel. I can see the CPU% can be 
> reduced to 1/3 compared to original in sequential test. But it bring some 
> overhead which will make random IO''s cpu utilization increase a
little.
>> 
> 
> Did you think also about expanding the ring size to something bigger?
If there''s a separate ring for the segment data, then there''s
room
for quite a bit more entries in the request ring page, so I don''t see
an immediate need to also increase that one.

Jan

Duan, Ronghui

2012-Aug-17 01:12 UTC

head link

Re: [RFC v1 0/5] VBD: enlarge max segment per request in blkfront

> >>> On 16.08.12 at 12:22, "Duan, Ronghui"
<ronghui.duan@intel.com> wrote:
> > The max segments for request in VBD queue is 11, while for Linux OS/
> > other VMM, the parameter is set to 128 in default. This may be caused
> > by the limited size of ring between Front/Back. So I guess whether we
> > can put segment data into another ring and dynamic use them for the
> > single request''s need. Here is prototype which don''t
do much test, but
> > it can work on Linux 64 bits 3.4.6 kernel. I can see the CPU% can be
> > reduced to 1/3 compared to original in sequential test. But it bring
> > some overhead which will make random IO''s cpu utilization
increase a little.
> 
> In what way to improve blkback is intended to be subject of a discussion on
the
> summit - are you by any chance going to be there? Fact is that there are a
> number of other extensions to the interface, and since you don''t
mention those
> I''m assuming you were not aware of them.
> So pity that I can''t be there. Indeed I did not see the other extension
before.
I begin this from seeing the larger overhead in CPU in sequential IO. Then I try
multiple ring and get a lot of help from Stefano. Then I have an idea adding a
ring as
Segment separate. Stefano suggest to send here and listen for advice.

Duan, Ronghui

2012-Aug-17 01:26 UTC

head link

Re: [RFC v1 0/5] VBD: enlarge max segment per request in blkfront

> On Thu, Aug 16, 2012 at 09:34:57AM -0400, Konrad Rzeszutek Wilk wrote:
> > On Thu, Aug 16, 2012 at 10:22:56AM +0000, Duan, Ronghui wrote:
> > > Hi, list.
> > > The max segments for request in VBD queue is 11, while for Linux
OS/ other
> VMM, the parameter is set to 128 in default.
> >
> > Like the FreeBSD one?
> >
Yeap.> > > This may be caused by the limited size of ring between
Front/Back. So I guess
> whether we can put segment data into another ring and dynamic use them for
> the single request''s need. Here is prototype which don''t
do much test, but it can
> work on Linux 64 bits 3.4.6 kernel. I can see the CPU% can be reduced to
1/3
> compared to original in sequential test. But it bring some overhead which
will
> make random IO''s cpu utilization increase a little.
> > >
> >
> > Did you think also about expanding the ring size to something bigger?
> >A separate ring will hold 1024 segments, I think it can feed most of
H/W''s BW.> > > Here is a short version data use only 1K random read and 64K
sequential read
> in direct mode. Testing a physical SSD disk as blkback in backend. CPU% is
got
> form xentop.
> >
> > > Read 1K random	IOPS	   Dom0 CPU	DomU CPU%
> > > 		W	52005.9	86.6	71
> > > 		W/O	52123.1	85.8	66.9
> > >
> > > Read 64K seq	BW MB/s	Dom0 CPU	DomU CPU%
> > > 	W	250		27.1	       10.6
> > > 	W/O	250		62.6	       31.1
> > >
> > >
> > > The patch will be simple if we only use new methods. But we need
consider
> that user may use new kernel as backend while an older one as frontend.
Also
> need considerate live migration case. So the change become huge...
> >
> > OK? I think you are implementing the extension documented in
> >
> > changeset:   24875:a59c1dcfe968
> > user:        Justin T. Gibbs <justing@spectralogic.com>
> > date:        Thu Feb 23 10:03:07 2012 +0000
> > summary:     blkif.h: Define and document the request
> number/size/segments extension
> >
> > changeset:   24874:f9789db96c39
> > user:        Justin T. Gibbs <justing@spectralogic.com>
> > date:        Thu Feb 23 10:02:30 2012 +0000
> > summary:     blkif.h: Document the Red Hat and Citrix blkif multi-page
ring
> extensions
> >
> > so that would be the max-requests-segments one?Oh, I miss this info. But sure I increase the
max-request-segments.> >
> >
> > > [RFC v1 1/5]
> > > 	In order to add new segment ring, refactoring the original code,
split some
> methods related with ring operation.
> > > [RFC v1 2/5]
> > > 	Add the segment ring support in blkfront. Most of code is about
> suspend/recover.
> > > [RFC v1 3/5]
> > > 	As the same, need refractor the original code in blkback.
> > > [RFC v1 4/5]
> > > 	In order to support different type of ring type in blkback, make
the
> pending_req list per disk.
> >
> > Not sure why you structured the patches like this way, but it might
> > make sense to order them in 1, 3, 4, 2, 5 order. The
''pending_req''/per
> > disk is an overall improvement that fixes a lot of concurrent issues.
> > I tried to implement this and ran in an issue with grants still being
> > active? Did you have issues with that or it worked just fine for you?
> > > [RFC v1 5/5]
> > > 	Add the segment ring support in blkback.
> >
> > So .. where are the patches? Did I miss them?
> 
> Ah, they just arrived.
> 
> I took a brief look at them, and I think they are the right step. The
things that are
> missing is that that you are missing the kfree  in 4/5 when the disk is
gone away.
> Also there are some code that is commented out and its not clear to me why
that
> is.I forget clean this. I want to listen for advise for the change for the
protocol. I can
Send out a ''patch'' after.> Lastly, this protocol should be negotiated using the
''max-request-.. '' or whichever
> is the proper type, not the blkfront-ring-type. It also would be good to CC
Justin
> as he might have some guidance in this and also could test the frontend on
his
> backend (or vice-versa). Not sure what is involved in setting up a FreeBSD
> backend that spectralogic is using.. Thought this might also involed
expanding the
> ring to be a multi-page one I think?
> I begin with this from multi-page ring, I also have a patch about multi-page
ring for
This patch. But due to no much positive influence on performance then I drop it.
> And I wonder if you need to have such a huge list of ops? Can some of them
be
> trimmed down?Yes, it can be less ops adding a common structure like
backend.> They v1 and v2 look quite similar. Oh, and instead of v1 and v2 I would
just call
> them ''large_segment'' and
''default_segment''. Or ''lgr_segment'' and
> ''def_segment'' perhaps?
> 
> Maybe ''huge_segment'' and
''generic_segment'' that sounds better.
> 
Good for me, I will reconsider the suitable parameter.> Lastly, its not clear to me why you are removing the padding on some of the
older
> blkif structures?It will cause some miss match size between 64bit Domu on 32bits Dom0, So remove
it.
I will double check there alignment after. > Thanks for posting this!
> > > -ronghui
> > >
> > >
> > >
> > > _______________________________________________
> > > Xen-devel mailing list
> > > Xen-devel@lists.xen.org
> > > http://lists.xen.org/xen-devel

Konrad Rzeszutek Wilk

2012-Sep-07 17:49 UTC

head link

Re: [RFC v1 0/5] VBD: enlarge max segment per request in blkfront

On Thu, Aug 16, 2012 at 10:22:56AM +0000, Duan, Ronghui
wrote:> Hi, list.
> The max segments for request in VBD queue is 11, while for Linux OS/ other
VMM, the parameter is set to 128 in default. This may be caused by the limited
size of ring between Front/Back. So I guess whether we can put segment data into
another ring and dynamic use them for the single request''s need. Here
is prototype which don''t do much test, but it can work on Linux 64 bits
3.4.6 kernel. I can see the CPU% can be reduced to 1/3 compared to original in
sequential test. But it bring some overhead which will make random IO''s
cpu utilization increase a little.
> 
> Here is a short version data use only 1K random read and 64K sequential
read in direct mode. Testing a physical SSD disk as blkback in backend. CPU% is
got form xentop.
> Read 1K random	IOPS	   Dom0 CPU	DomU CPU%
> 		W	52005.9	86.6	71
> 		W/O	52123.1	85.8	66.9
So I am getting some different numbers. I tried a simple 4K read:

[/dev/xvda1]
bssplit=4K
rw=read
direct=1
size=4g
ioengine=libaio
iodepth=64

And with your patch got:
  read : io=4096.0MB, bw=92606KB/s, iops=23151 , runt= 45292msec

without:
  read : io=4096.0MB, bw=145187KB/s, iops=36296 , runt= 28889msec

> 			
> Read 64K seq	BW MB/s	Dom0 CPU	DomU CPU%
> 	W	250		27.1	       10.6
> 	W/O	250		62.6	       31.1
Hadn''t tried that yet.

Duan, Ronghui

2012-Sep-13 02:28 UTC

head link

Re: [RFC v1 0/5] VBD: enlarge max segment per request in blkfront

> form xentop.
> > Read 1K random	IOPS	   Dom0 CPU	DomU CPU%
> > 		W	52005.9	86.6	71
> > 		W/O	52123.1	85.8	66.9
> 
> So I am getting some different numbers. I tried a simple 4K read:
> 
> [/dev/xvda1]
> bssplit=4K
> rw=read
> direct=1
> size=4g
> ioengine=libaio
> iodepth=64
> 
> And with your patch got:
>   read : io=4096.0MB, bw=92606KB/s, iops=23151 , runt= 45292msec
> 
> without:
>   read : io=4096.0MB, bw=145187KB/s, iops=36296 , runt= 28889msec
> What type of backend file you are using? In order to remove the influence of
cache in Dom0, I use a physical partition as backend.
In code level, although there is some overhead in my patch compared to original,
but it should not to be so big. :)

-ronghui
> -----Original Message-----
> From: Konrad Rzeszutek Wilk [mailto:konrad.wilk@oracle.com]
> Sent: Saturday, September 08, 2012 1:49 AM
> To: Duan, Ronghui
> Cc: xen-devel@lists.xen.org; Ian.Jackson@eu.citrix.com;
> Stefano.Stabellini@eu.citrix.com
> Subject: Re: [Xen-devel] [RFC v1 0/5] VBD: enlarge max segment per request
in
> blkfront
> 
> On Thu, Aug 16, 2012 at 10:22:56AM +0000, Duan, Ronghui wrote:
> > Hi, list.
> > The max segments for request in VBD queue is 11, while for Linux OS/
other
> VMM, the parameter is set to 128 in default. This may be caused by the
limited
> size of ring between Front/Back. So I guess whether we can put segment data
> into another ring and dynamic use them for the single request''s
need. Here is
> prototype which don''t do much test, but it can work on Linux 64
bits 3.4.6
> kernel. I can see the CPU% can be reduced to 1/3 compared to original in
> sequential test. But it bring some overhead which will make random
IO''s cpu
> utilization increase a little.
> >
> > Here is a short version data use only 1K random read and 64K
sequential read
> in direct mode. Testing a physical SSD disk as blkback in backend. CPU% is
got
> form xentop.
> > Read 1K random	IOPS	   Dom0 CPU	DomU CPU%
> > 		W	52005.9	86.6	71
> > 		W/O	52123.1	85.8	66.9
> 
> So I am getting some different numbers. I tried a simple 4K read:
> 
> [/dev/xvda1]
> bssplit=4K
> rw=read
> direct=1
> size=4g
> ioengine=libaio
> iodepth=64
> 
> And with your patch got:
>   read : io=4096.0MB, bw=92606KB/s, iops=23151 , runt= 45292msec
> 
> without:
>   read : io=4096.0MB, bw=145187KB/s, iops=36296 , runt= 28889msec
> 
> 
> >
> > Read 64K seq	BW MB/s	Dom0 CPU	DomU CPU%
> > 	W	250		27.1	       10.6
> > 	W/O	250		62.6	       31.1
> 
> Hadn''t tried that yet.

Jan Beulich

2012-Sep-13 07:32 UTC

head link

Re: [RFC v1 0/5] VBD: enlarge max segment per request in blkfront

>>> On 13.09.12 at 04:28, "Duan, Ronghui"
<ronghui.duan@intel.com> wrote:
>> And with your patch got:
>>   read : io=4096.0MB, bw=92606KB/s, iops=23151 , runt= 45292msec
>> 
>> without:
>>   read : io=4096.0MB, bw=145187KB/s, iops=36296 , runt= 28889msec
>> 
> What type of backend file you are using? In order to remove the influence
of
> cache in Dom0, I use a physical partition as backend.
But you certainly shouldn''t be proposing features getting used
unconditionally or by default that benefit one class of backing
devices and severely penalize others.

Jan

Stefano Stabellini

2012-Sep-13 11:05 UTC

head link

Re: [RFC v1 0/5] VBD: enlarge max segment per request in blkfront

On Thu, 13 Sep 2012, Jan Beulich wrote:> >>> On 13.09.12 at 04:28, "Duan, Ronghui"
<ronghui.duan@intel.com> wrote:
> >> And with your patch got:
> >>   read : io=4096.0MB, bw=92606KB/s, iops=23151 , runt= 45292msec
> >> 
> >> without:
> >>   read : io=4096.0MB, bw=145187KB/s, iops=36296 , runt= 28889msec
> >> 
> > What type of backend file you are using? In order to remove the
influence of
> > cache in Dom0, I use a physical partition as backend.
> 
> But you certainly shouldn''t be proposing features getting used
> unconditionally or by default that benefit one class of backing
> devices and severely penalize others.
Right.
I am wondering.. Considering that the in-kernel blkback is mainly used
with physical partitions, is it possible that your patches cause a
regression with unmodified backends that don''t support the new
protocol,
like QEMU for example?

Konrad Rzeszutek Wilk

2012-Sep-13 13:21 UTC

head link

Re: [RFC v1 0/5] VBD: enlarge max segment per request in blkfront

On Thu, Sep 13, 2012 at 02:28:12AM +0000, Duan, Ronghui
wrote:> > form xentop.
> > > Read 1K random	IOPS	   Dom0 CPU	DomU CPU%
> > > 		W	52005.9	86.6	71
> > > 		W/O	52123.1	85.8	66.9
> > 
> > So I am getting some different numbers. I tried a simple 4K read:
> > 
> > [/dev/xvda1]
> > bssplit=4K
> > rw=read
> > direct=1
> > size=4g
> > ioengine=libaio
> > iodepth=64
> > 
> > And with your patch got:
> >   read : io=4096.0MB, bw=92606KB/s, iops=23151 , runt= 45292msec
> > 
> > without:
> >   read : io=4096.0MB, bw=145187KB/s, iops=36296 , runt= 28889msec
> > 
> What type of backend file you are using? In order to remove the influence
of cache in Dom0, I use a physical partition as backend.
> In code level, although there is some overhead in my patch compared to
original, but it should not to be so big. :)
Right :-)

phy:/dev/sda,xvda,w

The sda is a Corsair SSD.

Konrad Rzeszutek Wilk

2012-Sep-13 13:23 UTC

head link

Re: [RFC v1 0/5] VBD: enlarge max segment per request in blkfront

On Thu, Sep 13, 2012 at 12:05:35PM +0100, Stefano Stabellini
wrote:> On Thu, 13 Sep 2012, Jan Beulich wrote:
> > >>> On 13.09.12 at 04:28, "Duan, Ronghui"
<ronghui.duan@intel.com> wrote:
> > >> And with your patch got:
> > >>   read : io=4096.0MB, bw=92606KB/s, iops=23151 , runt=
45292msec
> > >> 
> > >> without:
> > >>   read : io=4096.0MB, bw=145187KB/s, iops=36296 , runt=
28889msec
> > >> 
> > > What type of backend file you are using? In order to remove the
influence of
> > > cache in Dom0, I use a physical partition as backend.
> > 
> > But you certainly shouldn''t be proposing features getting
used
> > unconditionally or by default that benefit one class of backing
> > devices and severely penalize others.
> 
> Right.
> I am wondering.. Considering that the in-kernel blkback is mainly used
> with physical partitions, is it possible that your patches cause a
> regression with unmodified backends that don''t support the new
protocol,
> like QEMU for example?
Well for right now I am just using the most simple configuration to
eliminate any extra variables (stacking of components). So my
"testing" has been just on phy:/dev/sda,xvda,w with the sda being
a Corsair SSD.

Duan, Ronghui

2012-Sep-13 14:05 UTC

head link

Re: [RFC v1 0/5] VBD: enlarge max segment per request in blkfront

> > > But you certainly shouldn''t be proposing features
getting used
> > > unconditionally or by default that benefit one class of backing
> > > devices and severely penalize others.
> >
> > Right.
> > I am wondering.. Considering that the in-kernel blkback is mainly used
> > with physical partitions, is it possible that your patches cause a
> > regression with unmodified backends that don''t support the
new
> > protocol, like QEMU for example?
> 
> Well for right now I am just using the most simple configuration to
eliminate any
> extra variables (stacking of components). So my "testing" has
been just on
> phy:/dev/sda,xvda,w with the sda being a Corsair SSD.
I totally agree that we should not break others when enable what we want. 
But just from my mind, the patch only have a little overhead in the
front/backend code path. It will induce pure random IO with a little overhead.
I tried the 4K read case, I just got 50MB/s w/o the patch. I need a more
powerful disk to verified it.

Ronghui

> -----Original Message-----
> From: Konrad Rzeszutek Wilk [mailto:konrad.wilk@oracle.com]
> Sent: Thursday, September 13, 2012 9:24 PM
> To: Stefano Stabellini
> Cc: Jan Beulich; Duan, Ronghui; Ian Jackson; xen-devel@lists.xen.org
> Subject: Re: [Xen-devel] [RFC v1 0/5] VBD: enlarge max segment per request
in
> blkfront
> 
> On Thu, Sep 13, 2012 at 12:05:35PM +0100, Stefano Stabellini wrote:
> > On Thu, 13 Sep 2012, Jan Beulich wrote:
> > > >>> On 13.09.12 at 04:28, "Duan, Ronghui"
<ronghui.duan@intel.com> wrote:
> > > >> And with your patch got:
> > > >>   read : io=4096.0MB, bw=92606KB/s, iops=23151 , runt=
45292msec
> > > >>
> > > >> without:
> > > >>   read : io=4096.0MB, bw=145187KB/s, iops=36296 , runt=
28889msec
> > > >>
> > > > What type of backend file you are using? In order to remove
the
> > > > influence of cache in Dom0, I use a physical partition as
backend.
> > >
> > > But you certainly shouldn''t be proposing features
getting used
> > > unconditionally or by default that benefit one class of backing
> > > devices and severely penalize others.
> >
> > Right.
> > I am wondering.. Considering that the in-kernel blkback is mainly used
> > with physical partitions, is it possible that your patches cause a
> > regression with unmodified backends that don''t support the
new
> > protocol, like QEMU for example?
> 
> Well for right now I am just using the most simple configuration to
eliminate any
> extra variables (stacking of components). So my "testing" has
been just on
> phy:/dev/sda,xvda,w with the sda being a Corsair SSD.

Duan, Ronghui

2012-Sep-17 06:33 UTC

head link

Re: [RFC v1 0/5] VBD: enlarge max segment per request in blkfront

At last, I saw the regression in random io.
This is a patch to fix the performance regression. Original the pending request
members are allocated from the stack, I alloc them when each request arrives in
my last patch. But it will hurt performance. In this fix, I alloc all of them
when blkback init. But due to some bugs there, we can''t free it, the
same to other pending requests member. I am looking for the reason. But have no
idea for this now.
Konrad, thanks for your comments. Could you have a try when you have time.

-ronghui
> -----Original Message-----
> From: Duan, Ronghui
> Sent: Thursday, September 13, 2012 10:06 PM
> To: Konrad Rzeszutek Wilk; Stefano Stabellini
> Cc: Jan Beulich; Ian Jackson; xen-devel@lists.xen.org
> Subject: RE: [Xen-devel] [RFC v1 0/5] VBD: enlarge max segment per request
in
> blkfront
> 
> > > > But you certainly shouldn''t be proposing features
getting used
> > > > unconditionally or by default that benefit one class of
backing
> > > > devices and severely penalize others.
> > >
> > > Right.
> > > I am wondering.. Considering that the in-kernel blkback is mainly
> > > used with physical partitions, is it possible that your patches
> > > cause a regression with unmodified backends that don''t
support the
> > > new protocol, like QEMU for example?
> >
> > Well for right now I am just using the most simple configuration to
> > eliminate any extra variables (stacking of components). So my
> > "testing" has been just on phy:/dev/sda,xvda,w with the sda
being a Corsair
> SSD.
> 
> I totally agree that we should not break others when enable what we want.
> But just from my mind, the patch only have a little overhead in the
> front/backend code path. It will induce pure random IO with a little
overhead.
> I tried the 4K read case, I just got 50MB/s w/o the patch. I need a more
> powerful disk to verified it.
> 
> Ronghui
> 
> 
> > -----Original Message-----
> > From: Konrad Rzeszutek Wilk [mailto:konrad.wilk@oracle.com]
> > Sent: Thursday, September 13, 2012 9:24 PM
> > To: Stefano Stabellini
> > Cc: Jan Beulich; Duan, Ronghui; Ian Jackson; xen-devel@lists.xen.org
> > Subject: Re: [Xen-devel] [RFC v1 0/5] VBD: enlarge max segment per
> > request in blkfront
> >
> > On Thu, Sep 13, 2012 at 12:05:35PM +0100, Stefano Stabellini wrote:
> > > On Thu, 13 Sep 2012, Jan Beulich wrote:
> > > > >>> On 13.09.12 at 04:28, "Duan, Ronghui"
<ronghui.duan@intel.com> wrote:
> > > > >> And with your patch got:
> > > > >>   read : io=4096.0MB, bw=92606KB/s, iops=23151 ,
runt> > > > >> 45292msec
> > > > >>
> > > > >> without:
> > > > >>   read : io=4096.0MB, bw=145187KB/s, iops=36296 ,
runt> > > > >> 28889msec
> > > > >>
> > > > > What type of backend file you are using? In order to
remove the
> > > > > influence of cache in Dom0, I use a physical partition
as backend.
> > > >
> > > > But you certainly shouldn''t be proposing features
getting used
> > > > unconditionally or by default that benefit one class of
backing
> > > > devices and severely penalize others.
> > >
> > > Right.
> > > I am wondering.. Considering that the in-kernel blkback is mainly
> > > used with physical partitions, is it possible that your patches
> > > cause a regression with unmodified backends that don''t
support the
> > > new protocol, like QEMU for example?
> >
> > Well for right now I am just using the most simple configuration to
> > eliminate any extra variables (stacking of components). So my
> > "testing" has been just on phy:/dev/sda,xvda,w with the sda
being a Corsair
> SSD.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Konrad Rzeszutek Wilk

2012-Sep-17 14:37 UTC

head link

Re: [RFC v1 0/5] VBD: enlarge max segment per request in blkfront

On Mon, Sep 17, 2012 at 06:33:29AM +0000, Duan, Ronghui
wrote:> At last, I saw the regression in random io.
> This is a patch to fix the performance regression. Original the pending
request members are allocated from the stack, I alloc them when each request
arrives in my last patch. But it will hurt performance. In this fix, I alloc all
of them when blkback init. But due to some bugs there, we can''t free
it, the same to other pending requests member. I am looking for the reason. But
have no idea for this now.
Right. When I implemented something similar to this (allocate at startup
those pools of pages), I had the same problem of freeing the grant array
blowing up the machine.

But... this was before
http://git.kernel.org/?p=linux/kernel/git/konrad/xen.git;a=commit;h=2fc136eecd0c647a6b13fcd00d0c41a1a28f35a5

- which might be the fix for this.
> Konrad, thanks for your comments. Could you have a try when you have time.
> 
> -ronghui
> 
> > -----Original Message-----
> > From: Duan, Ronghui
> > Sent: Thursday, September 13, 2012 10:06 PM
> > To: Konrad Rzeszutek Wilk; Stefano Stabellini
> > Cc: Jan Beulich; Ian Jackson; xen-devel@lists.xen.org
> > Subject: RE: [Xen-devel] [RFC v1 0/5] VBD: enlarge max segment per
request in
> > blkfront
> > 
> > > > > But you certainly shouldn''t be proposing
features getting used
> > > > > unconditionally or by default that benefit one class of
backing
> > > > > devices and severely penalize others.
> > > >
> > > > Right.
> > > > I am wondering.. Considering that the in-kernel blkback is
mainly
> > > > used with physical partitions, is it possible that your
patches
> > > > cause a regression with unmodified backends that
don''t support the
> > > > new protocol, like QEMU for example?
> > >
> > > Well for right now I am just using the most simple configuration
to
> > > eliminate any extra variables (stacking of components). So my
> > > "testing" has been just on phy:/dev/sda,xvda,w with the
sda being a Corsair
> > SSD.
> > 
> > I totally agree that we should not break others when enable what we
want.
> > But just from my mind, the patch only have a little overhead in the
> > front/backend code path. It will induce pure random IO with a little
overhead.
> > I tried the 4K read case, I just got 50MB/s w/o the patch. I need a
more
> > powerful disk to verified it.
> > 
> > Ronghui
> > 
> > 
> > > -----Original Message-----
> > > From: Konrad Rzeszutek Wilk [mailto:konrad.wilk@oracle.com]
> > > Sent: Thursday, September 13, 2012 9:24 PM
> > > To: Stefano Stabellini
> > > Cc: Jan Beulich; Duan, Ronghui; Ian Jackson;
xen-devel@lists.xen.org
> > > Subject: Re: [Xen-devel] [RFC v1 0/5] VBD: enlarge max segment
per
> > > request in blkfront
> > >
> > > On Thu, Sep 13, 2012 at 12:05:35PM +0100, Stefano Stabellini
wrote:
> > > > On Thu, 13 Sep 2012, Jan Beulich wrote:
> > > > > >>> On 13.09.12 at 04:28, "Duan,
Ronghui" <ronghui.duan@intel.com> wrote:
> > > > > >> And with your patch got:
> > > > > >>   read : io=4096.0MB, bw=92606KB/s, iops=23151
, runt> > > > > >> 45292msec
> > > > > >>
> > > > > >> without:
> > > > > >>   read : io=4096.0MB, bw=145187KB/s,
iops=36296 , runt> > > > > >> 28889msec
> > > > > >>
> > > > > > What type of backend file you are using? In order
to remove the
> > > > > > influence of cache in Dom0, I use a physical
partition as backend.
> > > > >
> > > > > But you certainly shouldn''t be proposing
features getting used
> > > > > unconditionally or by default that benefit one class of
backing
> > > > > devices and severely penalize others.
> > > >
> > > > Right.
> > > > I am wondering.. Considering that the in-kernel blkback is
mainly
> > > > used with physical partitions, is it possible that your
patches
> > > > cause a regression with unmodified backends that
don''t support the
> > > > new protocol, like QEMU for example?
> > >
> > > Well for right now I am just using the most simple configuration
to
> > > eliminate any extra variables (stacking of components). So my
> > > "testing" has been just on phy:/dev/sda,xvda,w with the
sda being a Corsair
> > SSD.
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

Konrad Rzeszutek Wilk

2012-Sep-19 21:11 UTC

head link

Re: [RFC v1 0/5] VBD: enlarge max segment per request in blkfront

On Mon, Sep 17, 2012 at 06:33:29AM +0000, Duan, Ronghui
wrote:> At last, I saw the regression in random io.
> This is a patch to fix the performance regression. Original the pending
request members are allocated from the stack, I alloc them when each request
arrives in my last patch. But it will hurt performance. In this fix, I alloc all
of them when blkback init. But due to some bugs there, we can''t free
it, the same to other pending requests member. I am looking for the reason. But
have no idea for this now.
> Konrad, thanks for your comments. Could you have a try when you have time.
Sure.  I get now:

  read : io=4096.0MB, bw=144258KB/s, iops=36064 , runt= 29075msec

so much better I/O.
> > > > > >> And with your patch got:
> > > > > >>   read : io=4096.0MB, bw=92606KB/s, iops=23151
, runt> > > > > >> 45292msec
> > > > > >>
> > > > > >> without:
> > > > > >>   read : io=4096.0MB, bw=145187KB/s,
iops=36296 , runt> > > > > >> 28889msec

Xen devel - Aug 2012 - [RFC v1 0/5] VBD: enlarge max segment per request in blkfront

[RFC v1 0/5] VBD: enlarge max segment per request in blkfront

Re: [RFC v1 0/5] VBD: enlarge max segment per request in blkfront

Re: [RFC v1 0/5] VBD: enlarge max segment per request in blkfront

Re: [RFC v1 0/5] VBD: enlarge max segment per request in blkfront

Re: [RFC v1 0/5] VBD: enlarge max segment per request in blkfront

Re: [RFC v1 0/5] VBD: enlarge max segment per request in blkfront

Re: [RFC v1 0/5] VBD: enlarge max segment per request in blkfront

Re: [RFC v1 0/5] VBD: enlarge max segment per request in blkfront

Re: [RFC v1 0/5] VBD: enlarge max segment per request in blkfront

Re: [RFC v1 0/5] VBD: enlarge max segment per request in blkfront

Re: [RFC v1 0/5] VBD: enlarge max segment per request in blkfront

Re: [RFC v1 0/5] VBD: enlarge max segment per request in blkfront

Re: [RFC v1 0/5] VBD: enlarge max segment per request in blkfront

Re: [RFC v1 0/5] VBD: enlarge max segment per request in blkfront

Re: [RFC v1 0/5] VBD: enlarge max segment per request in blkfront

Re: [RFC v1 0/5] VBD: enlarge max segment per request in blkfront

Re: [RFC v1 0/5] VBD: enlarge max segment per request in blkfront