thr3ads.net - Xen devel - [Xen-devel] poor domU VBD performance. [Mar 2005]

If this information is useful, please help other people find it:
Share via:

Peter Bier

2005-Mar-26 18:14 UTC

[Xen-devel] poor domU VBD performance.

I have installed XEN and linux 2.6.10 on three different machines. The slowest 
of them was my computer at home running and Athlon XP 1600+ ( 1.4 GHZ ) and 256 
MB RAM. 

My Problem is reduced file-system performance in domU guests. These guest run
faster when I use loopbacked files on Dom0 than the do when I use real
partitions
and poulate them with a linux system. 

I found out that dom0 does file-system IO and raw IO ( using dd as a tool to
test
throughput from the disk ) is about exactly the same as when using a standard 
linux kernel without XEN. But the raw IO from DomU to an unused disk ( a second
disk in the system ) is limited to fourty percent of the speed I get within
Dom0.
This effect transforms to about the same ratio when doing real file-system IO.

I found this sympthom in all of the systems I installed. An early paper about 
XEN describes that the penalty when using VDBs is close to zero and neglectable.
I think this conflicts with the results I got and I believe this reflects that 
something in my configuration is wrong ( at least I hope so ). 

I have the drivers for my chipset linked into the kernel and hdparm tells me
that
DMA is enabled for the used disks ( using hdparm under Dom0 ). 

What worries me is that the results within Dom0 are completely satisfactory, 
while those in DomU are not. Do I have to change the kernel config for DomU ? Or
is there any special option I have to set in the kernel configuration for Dom0
or
even for xen?

I have compiled version 2.0.5 - the newest available, to my knowledge.

Any hints ??  



-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xen-devel

Ian Pratt

2005-Mar-27 17:41 UTC

head link

RE: [Xen-devel] poor domU VBD performance.

> I found out that dom0 does file-system IO and raw IO ( using 
> dd as a tool to test
> throughput from the disk ) is about exactly the same as when 
> using a standard 
> linux kernel without XEN. But the raw IO from DomU to an 
> unused disk ( a second
> disk in the system ) is limited to fourty percent of the 
> speed I get within Dom0.
Just to be clear: you''re doing a dd performance test within dom0 to the
exact same partition on the 2nd disk that you''re using when you start
the domU and finding that the domU ''dd'' performance is 40% of
the dom0
performance?

I''ve not heard of anyone else having problems like this. What happens
if
you use a partition on the 1st disk?

What chipset is the IDE controller? What device (e.g. sda1) are you
exporting the disk partition into the domU as?

Are you sure dom 0 is idle when doing the dd test in the domU?

Ian


-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_ide95&alloc_id396&op=click
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xen-devel

peter bier

2005-Mar-28 08:48 UTC

head link

[Xen-devel] Re: poor domU VBD performance.

Ian Pratt <m+Ian.Pratt <at> cl.cam.ac.uk> writes:
> 
> > I found out that dom0 does file-system IO and raw IO ( using 
> > dd as a tool to test
> > throughput from the disk ) is about exactly the same as when 
> > using a standard 
> > linux kernel without XEN. But the raw IO from DomU to an 
> > unused disk ( a second
> > disk in the system ) is limited to fourty percent of the 
> > speed I get within Dom0.
> 
> Just to be clear: you''re doing a dd performance test within dom0
to the
> exact same partition on the 2nd disk that you''re using when you
start
> the domU and finding that the domU ''dd'' performance is
40% of the dom0
> performance?
> 
> I''ve not heard of anyone else having problems like this. What
happens if
> you use a partition on the 1st disk?
> 
> What chipset is the IDE controller? What device (e.g. sda1) are you
> exporting the disk partition into the domU as?
> 
> Are you sure dom 0 is idle when doing the dd test in the domU?
> 
> Ian
> 
> -------------------------------------------------------
> SF email is sponsored by - The IT Product Guide
> Read honest & candid reviews on hundreds of IT Products from real
users.
> Discover which products truly live up to the hype. Start reading now.
> http://ads.osdn.com/?ad_ide95&alloc_id396&op=click
> 
Yes, I have tried various partitions both from Dom0 and DomU on both disks and
the result has always been a performance ratio of 2.5 between Dom0 and DomU. 
Yes I used dd for the test. But I came accross this problem doing IO into 
the filesystem. I was surprised that I did not only get no improvement when
switching for a loopbacked file as "device" for DomU to a real device
but that
I got a performance degradation. 
With that effect I started to test raw io performance using dd.

I am sure that the device was not busy and dom0 was idle when I did the test. 
There where no busy jobs in dom0 neither CPU- nor IO-bound. I don''t
know
which chipset the ide-controller is. My mainbord is a MSI KT7 board. I am
currently not at home, must lookup what the ide-controller is. 

The devices I exported to have been hda1 and hdb6 on my computer at home and
hdg5 in the office. In the latter case the disk is attached to a Promise202
raid controller. 

Is there any description what I have to do to configure my system adequately 
to run efficiently using Xen ? If such where available I might be able to 
locate the problem myself.

I have not yet done a "dd performance" test using loopbacked files as
devices
yet. I only used them as filesystems. 

Thanks in advance 
          Peter Bier 





-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xen-devel

peter bier

2005-Mar-28 12:44 UTC

head link

[Xen-devel] Re: poor domU VBD performance.

Ian Pratt <m+Ian.Pratt <at> cl.cam.ac.uk> writes:
> 
> > I found out that dom0 does file-system IO and raw IO ( using 
> > dd as a tool to test
> > throughput from the disk ) is about exactly the same as when 
> > using a standard 
> > linux kernel without XEN. But the raw IO from DomU to an 
> > unused disk ( a second
> > disk in the system ) is limited to fourty percent of the 
> > speed I get within Dom0.
> 
> Just to be clear: you''re doing a dd performance test within dom0
to the
> exact same partition on the 2nd disk that you''re using when you
start
> the domU and finding that the domU ''dd'' performance is
40% of the dom0
> performance?
> 
> I''ve not heard of anyone else having problems like this. What
happens if
> you use a partition on the 1st disk?
> 
> What chipset is the IDE controller? What device (e.g. sda1) are you
> exporting the disk partition into the domU as?
> 
> Are you sure dom 0 is idle when doing the dd test in the domU?
> 
> Ian
> 
> -------------------------------------------------------
> SF email is sponsored by - The IT Product Guide
> Read honest & candid reviews on hundreds of IT Products from real
users.
> Discover which products truly live up to the hype. Start reading now.
> http://ads.osdn.com/?ad_ide95&alloc_id396&op=click
> 

Yes I do the performance testing using dd. It''s only a simple
"benchmark" but
its results seem to indicate
a fundamental issue. I did the tests with the same partitions from DOM0 just as
DomU. I used both disks
and Dom0 achieved in all experiments 2.5 times the transfer rate of DomU.

I do not know the chipset of my IDE controller on my computer at home, while I
know that in the office it
pas a Promise raid controller ( I am neither at home nor in the office
momentarily ) . I am sure, that the
system was idle during all test ( meaning that there was only the standard
system running with no busy
jobs and no user program consuming CPU or IO resources.

I am very interested about Xen, but I need to fiy that problem.
If there is any "checklist" how to configure XEN efficiently, I might
be able to
fix the problem myself

Thanks

Peter Bier 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Ian Pratt

2005-Mar-28 18:55 UTC

head link

RE: [Xen-devel] poor domU VBD performance.

> > I found out that dom0 does file-system IO and raw IO ( using 
> > dd as a tool to test
> > throughput from the disk ) is about exactly the same as when 
> > using a standard 
> > linux kernel without XEN. But the raw IO from DomU to an 
> > unused disk ( a second
> > disk in the system ) is limited to fourty percent of the 
> > speed I get within Dom0.
OK, this looks like a perofrmance bug that''s crept into the 2.6 dom0
some where along the way. I''m surprised no-one else has spotted it. 

Please can you confirm that performance is OK if you use 2.4 as a dom0?
(It doesn''t matter what you use as guests).


Thanks,
Ian

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Andrew Theurer

2005-Mar-28 19:33 UTC

head link

Re: [Xen-devel] poor domU VBD performance.

On Monday 28 March 2005 12:55, Ian Pratt wrote:> > > I found out that dom0 does file-system IO and raw IO ( using
> > > dd as a tool to test
> > > throughput from the disk ) is about exactly the same as when
> > > using a standard
> > > linux kernel without XEN. But the raw IO from DomU to an
> > > unused disk ( a second
> > > disk in the system ) is limited to fourty percent of the
> > > speed I get within Dom0.
Is the second disk exactly the same as the first one?  I''ll try an IO
test
here on the same disk array with dom0 and domU and see what I get.

-Andrew>
> OK, this looks like a perofrmance bug that''s crept into the 2.6
dom0
> some where along the way. I''m surprised no-one else has spotted
it.
>
> Please can you confirm that performance is OK if you use 2.4 as a dom0?
> (It doesn''t matter what you use as guests).
>
>
> Thanks,
> Ian
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Ian Pratt

2005-Mar-28 20:14 UTC

head link

RE: [Xen-devel] poor domU VBD performance.

> > > > I found out that dom0 does file-system IO and raw IO ( using
> > > > dd as a tool to test
> > > > throughput from the disk ) is about exactly the same as when
> > > > using a standard
> > > > linux kernel without XEN. But the raw IO from DomU to an
> > > > unused disk ( a second
> > > > disk in the system ) is limited to fourty percent of the
> > > > speed I get within Dom0.
> 
> Is the second disk exactly the same as the first one?  I''ll 
> try an IO test 
> here on the same disk array with dom0 and domU and see what I get.
I''ve reproduced the problem and its a real issue. 

It only affects reads, and is almost certainly down to how the blkback
driver passes requests down to the actual device.

Does anyone on the list actually understand the changes made to linux
block IO between 2.4 and 2.6?

In the 2.6 blkfront there is no run_task_queue() to flush requests to
the lower layer, and we use submit_bio() instead of 2.4''s
generic_make_request(). It looks like this is happening syncronously
rather than queueing multiple requests. What should we be doing to cause
things to be batched?

Thanks,
Ian

 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Andrew Theurer

2005-Mar-28 20:18 UTC

head link

Re: [Xen-devel] poor domU VBD performance.

On Monday 28 March 2005 14:14, Ian Pratt wrote:> > > > > I found out that dom0 does file-system IO and raw IO (
using
> > > > > dd as a tool to test
> > > > > throughput from the disk ) is about exactly the same as
when
> > > > > using a standard
> > > > > linux kernel without XEN. But the raw IO from DomU to
an
> > > > > unused disk ( a second
> > > > > disk in the system ) is limited to fourty percent of
the
> > > > > speed I get within Dom0.
> >
> > Is the second disk exactly the same as the first one?  I''ll
> > try an IO test
> > here on the same disk array with dom0 and domU and see what I get.
>
> I''ve reproduced the problem and its a real issue.
>
> It only affects reads, and is almost certainly down to how the blkback
> driver passes requests down to the actual device.
>
> Does anyone on the list actually understand the changes made to linux
> block IO between 2.4 and 2.6?
>
> In the 2.6 blkfront there is no run_task_queue() to flush requests to
> the lower layer, and we use submit_bio() instead of 2.4''s
> generic_make_request(). It looks like this is happening syncronously
> rather than queueing multiple requests. What should we be doing to cause
> things to be batched?
There are multiple IO schedulers in 2.6.  Do you know which one is being used?  
It should say somewhere in the boot log.  Some read-ahead code also changed 
in 2.6.10-11 range.

So far I have not been able to reproduce this in xen-unstable with 2.6.  I am 
building xen-2.0.5 for a look.

-Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Andrew Theurer

2005-Mar-28 21:48 UTC

head link

Re: [Xen-devel] poor domU VBD performance.

On Monday 28 March 2005 14:14, Ian Pratt wrote:> > > > > I found out that dom0 does file-system IO and raw IO (
using
> > > > > dd as a tool to test
> > > > > throughput from the disk ) is about exactly the same as
when
> > > > > using a standard
> > > > > linux kernel without XEN. But the raw IO from DomU to
an
> > > > > unused disk ( a second
> > > > > disk in the system ) is limited to fourty percent of
the
> > > > > speed I get within Dom0.
> >
> > Is the second disk exactly the same as the first one?  I''ll
> > try an IO test
> > here on the same disk array with dom0 and domU and see what I get.
>
> I''ve reproduced the problem and its a real issue.
> It only affects reads, and is almost certainly down to how the blkback
> driver passes requests down to the actual device.
>
> Does anyone on the list actually understand the changes made to linux
> block IO between 2.4 and 2.6?
>
> In the 2.6 blkfront there is no run_task_queue() to flush requests to
> the lower layer, and we use submit_bio() instead of 2.4''s
> generic_make_request(). It looks like this is happening syncronously
> rather than queueing multiple requests. What should we be doing to cause
> things to be batched?
To my knowlege you cannot queue multiple bio requests at once.  The IO 
schedulers should batch them up before submitting to the actual devices.  I 
tried xen-2.0.5 and xen-unstable with a sequential read test using 256k 
request size and 8 reader threads with o_direct on a lvm-raid-0 scsci array 
(no HW cache) and got:

xen-2-dom0-2.6.10:  177 MB/sec
xen-2-domU-2.6.10:  185 MB/sec
xen-3-dom0-2.6.11:  177 MB/sec
xen-3-domU-2.6.11:  185 MB/sec

Better results with VBD :)  I am wondering if going through 2 layers of IO 
schedulers streams the IO better.  I was using AS scheduler.  I am going to 
try noop scheduler and see what i get.

What block size were you using with dd?

-Andrew



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Ian Pratt

2005-Mar-28 22:17 UTC

head link

RE: [Xen-devel] poor domU VBD performance.

> tried xen-2.0.5 and xen-unstable with a sequential read test 
> using 256k 
> request size and 8 reader threads with o_direct on a 
> lvm-raid-0 scsci array 
> (no HW cache) and got:
> 
> xen-2-dom0-2.6.10:  177 MB/sec
> xen-2-domU-2.6.10:  185 MB/sec
> xen-3-dom0-2.6.11:  177 MB/sec
> xen-3-domU-2.6.11:  185 MB/sec
Please can you try a simple ''dd if=/dev/sdaXX of=/dev/null bs=1024k
count=4096''
to read 4GB from the partition both in dom0 and domU.

When booting, I get the following output, which I presume is the
default?
 elevator: using anticipatory as default io scheduler

Thanks,
Ian

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Peter Bier

2005-Mar-28 23:38 UTC

head link

[Xen-devel] Re: poor domU VBD performance.

Andrew Theurer <habanero <at> us.ibm.com> writes:
> 
> On Monday 28 March 2005 14:14, Ian Pratt wrote:
> > > > > > I found out that dom0 does file-system IO and raw
IO ( using
> > > > > > dd as a tool to test
> > > > > > throughput from the disk ) is about exactly the
same as when
> > > > > > using a standard
> > > > > > linux kernel without XEN. But the raw IO from DomU
to an
> > > > > > unused disk ( a second
> > > > > > disk in the system ) is limited to fourty percent
of the
> > > > > > speed I get within Dom0.
> > >
> > > Is the second disk exactly the same as the first one? 
I''ll
> > > try an IO test
> > > here on the same disk array with dom0 and domU and see what I
get.
> >
> > I''ve reproduced the problem and its a real issue.
> > It only affects reads, and is almost certainly down to how the blkback
> > driver passes requests down to the actual device.
> >
> > Does anyone on the list actually understand the changes made to linux
> > block IO between 2.4 and 2.6?
> >
> > In the 2.6 blkfront there is no run_task_queue() to flush requests to
> > the lower layer, and we use submit_bio() instead of 2.4''s
> > generic_make_request(). It looks like this is happening syncronously
> > rather than queueing multiple requests. What should we be doing to
cause
> > things to be batched?
> 
> To my knowlege you cannot queue multiple bio requests at once.  The IO 
> schedulers should batch them up before submitting to the actual devices.  I
> tried xen-2.0.5 and xen-unstable with a sequential read test using 256k 
> request size and 8 reader threads with o_direct on a lvm-raid-0 scsci array
> (no HW cache) and got:
> 
> xen-2-dom0-2.6.10:  177 MB/sec
> xen-2-domU-2.6.10:  185 MB/sec
> xen-3-dom0-2.6.11:  177 MB/sec
> xen-3-domU-2.6.11:  185 MB/sec
> 
> Better results with VBD :)  I am wondering if going through 2 layers of IO 
> schedulers streams the IO better.  I was using AS scheduler.  I am going to
> try noop scheduler and see what i get.
> 
> What block size were you using with dd?
> 
> -Andrew
> 

My dd command was always the same: "dd if=/dev/hdb6 bs=64k count=1000"
and it
took 1.6 seconds on hdb6 and 2.2 seconds on hda1 when running in Dom0 and it
took 4.6 seconds on hdb6 and 5.8 seconds on hda1 when running on DomU. I did
one experiment with count=10000 and it took ten times as long in each of the
four cases.

I have done the following tests:
DomU : dd if=/dev/hdb6 of=/dev/null bs=1024k count=4000 ; duration 301 sec
DomU : dd if=/dev/hdb6 of=/dev/null bs=1024k count=4000 ; duration 370 sec

Dom0 : dd if=/dev/hdb6 of=/dev/null bs=1024k count=4000 ; duration 115 sec
Dom0 : dd if=/dev/hda1 of=/dev/null bs=1024k count=4000 ; duration 140 sec 

Peter


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Andrew Theurer

2005-Mar-29 00:27 UTC

head link

Re: [Xen-devel] Re: poor domU VBD performance.

> My dd command was always the same: "dd if=/dev/hdb6 bs=64k
count=1000" and
> it took 1.6 seconds on hdb6 and 2.2 seconds on hda1 when running in Dom0
> and it took 4.6 seconds on hdb6 and 5.8 seconds on hda1 when running on
> DomU. I did one experiment with count=10000 and it took ten times as long
> in each of the four cases.
>
> I have done the following tests:
> DomU : dd if=/dev/hdb6 of=/dev/null bs=1024k count=4000 ; duration 301 sec
> DomU : dd if=/dev/hdb6 of=/dev/null bs=1024k count=4000 ; duration 370 sec
>
> Dom0 : dd if=/dev/hdb6 of=/dev/null bs=1024k count=4000 ; duration 115 sec
> Dom0 : dd if=/dev/hda1 of=/dev/null bs=1024k count=4000 ; duration 140 sec
OK, I have produced this with both dd and o-direct now.  On o-direct, I needed 
what was the effective dd block request size (128k) and I got similar 
results.  My results are much worse, due to that I am driving 14 disks:

dom0:	153.5 MB/sec
domU:	 12.7 MB/sec

It looks like there might be a problem were we are not getting a timely 
response back from dom0 VBD driver that the io request is complete, which 
limits the number of outstanding requests to a level which cannot keep the 
disk utilized well.  If you drive enough IO outstanding requests (which can 
be done with either o-direct with large request or a much larger readahead 
setting with buffered IO), it''s not an issue. 

In the domU, can you try setting the readahead size to a much larger value 
using hdparm? Something like hdparm -a 2028, then run dd?

-Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Pasi Kärkkäinen

2005-Mar-29 06:20 UTC

head link

Re: [Xen-devel] poor domU VBD performance.

On Sun, Mar 27, 2005 at 06:41:27PM +0100, Ian Pratt
wrote:> > I found out that dom0 does file-system IO and raw IO ( using 
> > dd as a tool to test
> > throughput from the disk ) is about exactly the same as when 
> > using a standard 
> > linux kernel without XEN. But the raw IO from DomU to an 
> > unused disk ( a second
> > disk in the system ) is limited to fourty percent of the 
> > speed I get within Dom0.
> 
> Just to be clear: you''re doing a dd performance test within dom0
to the
> exact same partition on the 2nd disk that you''re using when you
start
> the domU and finding that the domU ''dd'' performance is
40% of the dom0
> performance?
> 
> I''ve not heard of anyone else having problems like this. What
happens if
> you use a partition on the 1st disk?
> 
> What chipset is the IDE controller? What device (e.g. sda1) are you
> exporting the disk partition into the domU as?
> 
> Are you sure dom 0 is idle when doing the dd test in the domU?
> 
I reported same kind of problems earlier too. 

2.4 domU is really slow (1/3 speed of 2.6 dom0), 2.6 domU is faster, but not
even close
to the speed of 2.6 dom0.

My tests were on top lvm over sw-raid5.

-- Pasi Kärkkäinen
       
                                   ^
                                .     .
                                 Linux
                              /    -    \
                             Choice.of.the
                           .Next.Generation.


-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_ide95&alloc_id396&op=click
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xen-devel

Ian Pratt

2005-Mar-29 08:13 UTC

head link

RE: [Xen-devel] Re: poor domU VBD performance.

> It looks like there might be a problem were we are not 
> getting a timely 
> response back from dom0 VBD driver that the io request is 
> complete, which 
> limits the number of outstanding requests to a level which 
> cannot keep the 
> disk utilized well.  If you drive enough IO outstanding 
> requests (which can 
> be done with either o-direct with large request or a much 
> larger readahead 
> setting with buffered IO), it''s not an issue. 
Andrew, please could you try this with a 2.4 dom0, 2.6 domU.
Thanks,
Ian

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

peter bier

2005-Mar-29 08:44 UTC

head link

[Xen-devel] Re: poor domU VBD performance.

Ian Pratt <m+Ian.Pratt <at> cl.cam.ac.uk> writes:

>  elevator: using anticipatory as default io scheduler
> 
> Thanks,
> Ian
> Yes, the output is 
   elevator: using anticipatory as default io scheduler


Peter



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

peter bier

2005-Mar-29 11:39 UTC

head link

[Xen-devel] Re: poor domU VBD performance.

Andrew Theurer <habanero <at> us.ibm.com> writes:
> 
> > My dd command was always the same: "dd if=/dev/hdb6 bs=64k
count=1000" and
> > it took 1.6 seconds on hdb6 and 2.2 seconds on hda1 when running in
Dom0
> > and it took 4.6 seconds on hdb6 and 5.8 seconds on hda1 when running
on
> > DomU. I did one experiment with count=10000 and it took ten times as
long
> > in each of the four cases.
> >
> > I have done the following tests:
> > DomU : dd if=/dev/hdb6 of=/dev/null bs=1024k count=4000 ; duration 301
sec
> > DomU : dd if=/dev/hdb6 of=/dev/null bs=1024k count=4000 ; duration 370
sec
> >
> > Dom0 : dd if=/dev/hdb6 of=/dev/null bs=1024k count=4000 ; duration 115
sec
> > Dom0 : dd if=/dev/hda1 of=/dev/null bs=1024k count=4000 ; duration 140
sec
> 
> OK, I have produced this with both dd and o-direct now.  On o-direct, I 
needed > what was the effective dd block request size (128k) and I got similar 
> results.  My results are much worse, due to that I am driving 14 disks:
> 
> dom0:	153.5 MB/sec
> domU:	 12.7 MB/sec
> 
> It looks like there might be a problem were we are not getting a timely 
> response back from dom0 VBD driver that the io request is complete, which 
> limits the number of outstanding requests to a level which cannot keep the 
> disk utilized well.  If you drive enough IO outstanding requests (which can
> be done with either o-direct with large request or a much larger readahead 
> setting with buffered IO), it''s not an issue. 
> 
> In the domU, can you try setting the readahead size to a much larger value 
> using hdparm? Something like hdparm -a 2028, then run dd?
> 
> -Andrew
> 
It''s tuesday now, and I am working in the office using my two machines
with
the Promise controller. The two differ in that one is using ide disks, while 
the other, the newer one has sata disks. I have restricted myself to the 
elder computer. 

It has one disk, a Maxtor 6Y120L0, 120 G with a 2048 KB Cache. On that machine 
the disk is hde and the exported slice is hde1. The slice is not in use and I 
am running the os from a loop-backed file as rootfs. I have done a 

"dd if=/dev/hde1 of=/dev/null bs=1024k count=1024"

in domU. 

hdparm told that the default setup was 256k readahead.

I have tested the performance with the following readahead settings:

readahead    |     duration 
128 sectors  |     160 sec
256 sectors  |      76 sec
512 sectors  |      18.5 sec
1024 sectors |      19.5 sec
2048 sectors |     786 sec
1536 sectors |     775 sec
1200 sectors |     457 sec
1000 sectors |     20 sec
800 sectors  |     18.5 sec
600 sectors  |     18.5 sec  

dom0 takes 18.0 secs no matter of the readahead setting in Dom0 is.

Peter 


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Kurt Garloff

2005-Mar-29 13:34 UTC

head link

Re: RE: [Xen-devel] poor domU VBD performance.

Hi,

On Mon, Mar 28, 2005 at 09:14:10PM +0100, Ian Pratt
wrote:> > Is the second disk exactly the same as the first one?  I''ll 
> > try an IO test 
> > here on the same disk array with dom0 and domU and see what I get.
> 
> I''ve reproduced the problem and its a real issue. 
> 
> It only affects reads, and is almost certainly down to how the blkback
> driver passes requests down to the actual device.
Two points to look at:
* Block size (filesystems set this to 4k normally, default it 1k)
* Read ahead (you need to do it, otherwise you end up doing tiny
  requests). You can tune it in sysfs.

Regards,
-- 
Kurt Garloff, Director SUSE Labs, Novell Inc.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Ian Pratt

2005-Mar-29 13:38 UTC

head link

RE: [Xen-devel] Re: poor domU VBD performance.

> "dd if=/dev/hde1 of=/dev/null bs=1024k count=1024"
> 
> in domU. 
> 
> hdparm told that the default setup was 256k readahead.
Do you mean KB or sectors?
> I have tested the performance with the following readahead settings:
> 
> readahead    |     duration 
> 128 sectors  |     160 sec
> 256 sectors  |      76 sec
> 512 sectors  |      18.5 sec
> 1024 sectors |      19.5 sec
> 1200 sectors |     457 sec
> dom0 takes 18.0 secs no matter of the readahead setting in Dom0 is.
Would you mind repeating these experiments with a 2.4 dom0 and a 2.6domU
?

The performance cliff below 512 and above 1024 sectors is spectacular.
This is all rather confusing, but at least we know it can be made to
work fast. Changing the domU readahead is unlikely to be the right fix.
We just need to figure out how to keep it on the sweet spot...
Thanks,
Ian
 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Ian Pratt

2005-Mar-29 14:19 UTC

head link

RE: [Xen-devel] Re: poor domU VBD performance.

> Would you mind repeating these experiments with a 2.4 dom0 
> and a 2.6domU
> ?
Also, please could you try exporting the device to the dom0 as a scsi
device e.g. sda1 rather than ide device hde1 or hda1. [Yes, I know this
shouldn''t make any difference, but I have a suspicion it will.]

Thanks,
Ian

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

B.G. Bruce

2005-Mar-29 14:28 UTC

head link

RE: [Xen-devel] Re: poor domU VBD performance.

Has anyone looked into using the other schedulers?  Potentially noop or
deadline for Dom0 with deadline or anticipatory for DomUs , or go the
other way - noop/deadline in DomUs and cfq/deadline in Dom0?  It
actually make some sense that DomU performance would be degraded from
Dom0''s as the request has to go through it''s scheduler (which
typically
was designed to run async with some "fairness queuing") which is
limited
by  XEN''s bvt scheduler and then Dom0''s disk i/o scheduler
which is also
limited by XEN''s bvt scheduler (doesn''t it?)
Enabling/Disabling
Preemptible Kernel may also provide some light on the situation. 

If this becomes an item open to modification for performance reasons,
I''d prefer to have Dom0 set the performance of the DomU''s.  It
wouldn''t
really matter for the moment, but once DomU''s get to boot their own
kernel (as in hosting services providing xen''d servers/services where
the client can compile their own kernel - which has been talked about -
this will become a requirement/feature request).

I was actually going to do some testing in these areas, but my test box
(AMD 3000+/water cooled) overheated and fried the northbridge/memory.
Oh, the joys of living in the tropics ;-)  A new MB (upgraded to AMD64)
should arrive end of the week or early next week so I can test then if
no one else get around to it.

Regards,
Brian. 

On Tue, 2005-03-29 at 09:38, Ian Pratt wrote:> > "dd if=/dev/hde1 of=/dev/null bs=1024k count=1024"
> > 
> > in domU. 
> > 
> > hdparm told that the default setup was 256k readahead.
> 
> Do you mean KB or sectors?
> 
> > I have tested the performance with the following readahead settings:
> > 
> > readahead    |     duration 
> > 128 sectors  |     160 sec
> > 256 sectors  |      76 sec
> > 512 sectors  |      18.5 sec
> > 1024 sectors |      19.5 sec
> > 1200 sectors |     457 sec
> > dom0 takes 18.0 secs no matter of the readahead setting in Dom0 is.
> 
> Would you mind repeating these experiments with a 2.4 dom0 and a 2.6domU
> ?
> 
> The performance cliff below 512 and above 1024 sectors is spectacular.
> This is all rather confusing, but at least we know it can be made to
> work fast. Changing the domU readahead is unlikely to be the right fix.
> We just need to figure out how to keep it on the sweet spot...
> Thanks,
> Ian
>  
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
> 
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

peter bier

2005-Mar-29 15:27 UTC

head link

[Xen-devel] Re: poor domU VBD performance.

Ian Pratt <m+Ian.Pratt <at> cl.cam.ac.uk>
writes:> 
> > Would you mind repeating these experiments with a 2.4 dom0 
> > and a 2.6domU
> > ?
> 
> Also, please could you try exporting the device to the dom0 as a scsi
> device e.g. sda1 rather than ide device hde1 or hda1. [Yes, I know this
> shouldn''t make any difference, but I have a suspicion it will.]
> 
> Thanks,
> Ian
> 

Ian, 

I will do the tests you asked for. but today is my wife''s birthday, and
I
am already at home so I have no access to me test computers there. 

I have done some testing with the second, newer host with SATA Disks. The change
of the readahead quantity showed no effect on the reduced throughput. I do not
remember the exact ratio, but I think it was quite similar than with the ide 
disks and readahead of 256 sectors.

I will report on that tomorrow in a more detailed fashion. 

And I will do the tests with linux 2.4 as domU 

Peter 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Andrew Theurer

2005-Mar-29 18:39 UTC

head link

Re: [Xen-devel] Re: poor domU VBD performance.

On Tuesday 29 March 2005 02:13, Ian Pratt wrote:> > It looks like there might be a problem were we are not
> > getting a timely
> > response back from dom0 VBD driver that the io request is
> > complete, which
> > limits the number of outstanding requests to a level which
> > cannot keep the
> > disk utilized well.  If you drive enough IO outstanding
> > requests (which can
> > be done with either o-direct with large request or a much
> > larger readahead
> > setting with buffered IO), it''s not an issue.
>
> Andrew, please could you try this with a 2.4 dom0, 2.6 domU.
2.4 might be a little while for me, as I an running Fedora core3 with udev.  
If anyone has any easy way to get around the hotplug/udev stuff, then I can 
do this.

I did run a sequential read on a single disk again (using noop IO schedulers 
in both domains) with various request sizes with o_direct while capturing 
iostsat output.  The results are interesting.  I have included the data in a 
file because it would just line wrap an be unreadable in this email text.  
Notice the service commit times for domU tests.  It''s like the IO
request
queue is being plugged for a minimum of 10ms in dom0.  Merges happening for
>4K requests in dom0 (while hosting domU''s IO) seem to support this.
-Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Steven Hand

2005-Mar-29 19:13 UTC

head link

Re: [Xen-devel] Re: poor domU VBD performance.

> On Tuesday 29 March 2005 02:13, Ian Pratt wrote:
> > > It looks like there might be a problem were we are not
> > > getting a timely
> > > response back from dom0 VBD driver that the io request is
> > > complete, which
> > > limits the number of outstanding requests to a level which
> > > cannot keep the
> > > disk utilized well.  If you drive enough IO outstanding
> > > requests (which can
> > > be done with either o-direct with large request or a much
> > > larger readahead
> > > setting with buffered IO), it''s not an issue.
> >
> > Andrew, please could you try this with a 2.4 dom0, 2.6 domU.
> 
> 2.4 might be a little while for me, as I an running Fedora core3 with udev.
> If anyone has any easy way to get around the hotplug/udev stuff, then I can
> do this.
You can run a populated /dev "underneath" the udev stuff quite
happily;
e.g. if you boot into FC3 w/ udev do: 

  cd /dev/ 
  tar zcpf /root/foo.tgz . 

If you can boot from a rescue CD or sim, just mount your FC3 
partition and untar the device nodes.

Works just fine. 

> I did run a sequential read on a single disk again (using noop IO
schedulers
> in both domains) with various request sizes with o_direct while capturing 
> iostsat output.  The results are interesting.  I have included the data in
a
> file because it would just line wrap an be unreadable in this email text.  
> Notice the service commit times for domU tests.  It''s like the IO
request
> queue is being plugged for a minimum of 10ms in dom0.  Merges happening for
> >4K requests in dom0 (while hosting domU''s IO) seem to support
this.
[snip]
 

Ah - thanks for this -- will take a detailed look shortly. 

cheers,

S.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Kurt Garloff

2005-Mar-29 22:45 UTC

head link

Re: RE: RE: [Xen-devel] poor domU VBD performance.

Hi Ian,

On Tue, Mar 29, 2005 at 07:09:50PM +0100, Ian Pratt
wrote:> We''d really appreciate your help on this, or from someone else at
SuSE
> who actually understands the Linux block layer?
I''m Cc''ing Jens ...
 > In the 2.6 blkfront driver, what scheduler should we be registering
> with? What should we be setting as max_sectors? Are there other
> parameters we should be setting that we aren''t? (block size?)
I think noop is a good choice for secondary domains, as you don''t
want to be too clever there, otherwise you stack a clever scheduler
on top of a clever scheduler. noop basically only does front- and
backmerging to make the request sizes larger.

But you probably should initialize the readahead sectors.

Please test attached patch.

It fixed the problem for me, but my testing was very limited,
I only had a small loopback mounted root fs to test with quickly.

Note that initializing to 256 (128k) would be OK as well (and might 
be the better default); it seems to be set to 256 (128k) by default, 
but it''s not ... If you explicitly set it to 256, the performance 
still increases tremendously.
> In the blkback driver that actually issues the IO''s in dom0, is
there
> something we should be doing to cause IOs to get batched? In 2.4 we used
> a task_queue to push the IO through to the disk having queued it with
> generic_make_request(). In 2.6 we''re currently using submit_bio()
and
> just hoping that batching happens.
I don''t think the blkback driver does anything wrong here.

Regards,
-- 
Kurt Garloff, Director SUSE Labs, Novell Inc.



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Andrew Theurer

2005-Mar-29 22:59 UTC

head link

Re: [Xen-devel] poor domU VBD performance.

On Tuesday 29 March 2005 16:45, Kurt Garloff wrote:> Hi Ian,
>
> On Tue, Mar 29, 2005 at 07:09:50PM +0100, Ian Pratt wrote:
> > We''d really appreciate your help on this, or from someone
else at SuSE
> > who actually understands the Linux block layer?
>
> I''m Cc''ing Jens ...
>
> > In the 2.6 blkfront driver, what scheduler should we be registering
> > with? What should we be setting as max_sectors? Are there other
> > parameters we should be setting that we aren''t? (block size?)
>
> I think noop is a good choice for secondary domains, as you don''t
> want to be too clever there, otherwise you stack a clever scheduler
> on top of a clever scheduler. noop basically only does front- and
> backmerging to make the request sizes larger.
>
> But you probably should initialize the readahead sectors.
>
> Please test attached patch.
This should help the case where one is doing buffered IO (so readahead gets 
used) but for o_direct, I still think we will have a problem.  On Dom0, I can 
drive 58MB/sec with sequential read with o_direct with just a 32k request 
size, but on domU with the same request size I can only get ~6MB/sec.  I am 
still wondering is somthing is up with the backend driver.  It apperas that 
the backend driver only submits requests to the actual device every 10ms. 
With a much larger request size (for o_direct) or a large readahead, 10ms is 
often enough to keep the disk streaming data.  With smaller request sizes or 
small read ahaad, the disk just doesn''t read effciently.  

-Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Kurt Garloff

2005-Mar-29 23:19 UTC

head link

Re: [Xen-devel] poor domU VBD performance.

Hi Andrew,

On Tue, Mar 29, 2005 at 04:59:18PM -0600, Andrew Theurer
wrote:> On Tuesday 29 March 2005 16:45, Kurt Garloff wrote:
> > Please test attached patch.
> 
> This should help the case where one is doing buffered IO (so readahead gets
> used) but for o_direct, I still think we will have a problem.  On Dom0, I
can
> drive 58MB/sec with sequential read with o_direct with just a 32k request 
> size, but on domU with the same request size I can only get ~6MB/sec.
I can''t reproduce this.
Does this depend on whether your domU root is a loopback mounted file
or a real partition/LVM device?
> I am still wondering is somthing is up with the backend driver.  It
> apperas that the backend driver only submits requests to the actual
> device every 10ms.  With a much larger request size (for o_direct) or
> a large readahead, 10ms is often enough to keep the disk streaming
> data.  With smaller request sizes or small read ahaad, the disk just
> doesn''t read effciently.  
We might have a problem with unplugging then.

Regards,
-- 
Kurt Garloff, Director SUSE Labs, Novell Inc.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Andrew Theurer

2005-Mar-29 23:26 UTC

head link

Re: [Xen-devel] poor domU VBD performance.

On Tuesday 29 March 2005 17:19, Kurt Garloff wrote:> Hi Andrew,
>
> On Tue, Mar 29, 2005 at 04:59:18PM -0600, Andrew Theurer wrote:
> > On Tuesday 29 March 2005 16:45, Kurt Garloff wrote:
> > > Please test attached patch.
> >
> > This should help the case where one is doing buffered IO (so readahead
> > gets used) but for o_direct, I still think we will have a problem.  On
> > Dom0, I can drive 58MB/sec with sequential read with o_direct with
just a
> > 32k request size, but on domU with the same request size I can only
get
> > ~6MB/sec.
>
> I can''t reproduce this.
> Does this depend on whether your domU root is a loopback mounted file
> or a real partition/LVM device?
I am not sure.  What program are you using for o_direct reads?  I use a real 
LVM device for domU root and then another whole disk for the read tests.
> > I am still wondering is somthing is up with the backend driver.  It
> > apperas that the backend driver only submits requests to the actual
> > device every 10ms.  With a much larger request size (for o_direct) or
> > a large readahead, 10ms is often enough to keep the disk streaming
> > data.  With smaller request sizes or small read ahaad, the disk just
> > doesn''t read effciently.
>
> We might have a problem with unplugging then.
That''s what I suspect, but I do not know the driver code well enough to
say
for sure.

-Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jens Axboe

2005-Mar-30 08:53 UTC

head link

Re: RE: RE: [Xen-devel] poor domU VBD performance.

On Wed, Mar 30 2005, Kurt Garloff wrote:> Hi Ian,
> 
> On Tue, Mar 29, 2005 at 07:09:50PM +0100, Ian Pratt wrote:
> > We''d really appreciate your help on this, or from someone
else at SuSE
> > who actually understands the Linux block layer?
> 
> I''m Cc''ing Jens ...
>  
> > In the 2.6 blkfront driver, what scheduler should we be registering
> > with? What should we be setting as max_sectors? Are there other
> > parameters we should be setting that we aren''t? (block size?)
> 
> I think noop is a good choice for secondary domains, as you don''t
> want to be too clever there, otherwise you stack a clever scheduler
> on top of a clever scheduler. noop basically only does front- and
> backmerging to make the request sizes larger.
> 
> But you probably should initialize the readahead sectors.
> 
> Please test attached patch.
> 
> It fixed the problem for me, but my testing was very limited,
> I only had a small loopback mounted root fs to test with quickly.
> 
> Note that initializing to 256 (128k) would be OK as well (and might 
> be the better default); it seems to be set to 256 (128k) by default, 
> but it''s not ... If you explicitly set it to 256, the performance 
> still increases tremendously.
> 
> > In the blkback driver that actually issues the IO''s in dom0,
is there
> > something we should be doing to cause IOs to get batched? In 2.4 we
used
> > a task_queue to push the IO through to the disk having queued it with
> > generic_make_request(). In 2.6 we''re currently using
submit_bio() and
> > just hoping that batching happens.
> 
> I don''t think the blkback driver does anything wrong here.
> 
> Regards,
> -- 
> Kurt Garloff, Director SUSE Labs, Novell Inc.
> From: Kurt Garloff <garloff@suse.de>
> Subject: Initialize readahead in vbd Q init code
> 
> The domU read performance is poor without readahead, so
> better make sure we initialize this value.
> 
> Signed-off-by: Kurt Garloff <garloff@suse.de>
> 
> Index: linux-2.6.11/drivers/xen/blkfront/vbd.c
> ==================================================================> ---
linux-2.6.11.orig/drivers/xen/blkfront/vbd.c
> +++ linux-2.6.11/drivers/xen/blkfront/vbd.c
> @@ -268,8 +268,11 @@ static struct gendisk *xlvbd_get_gendisk
>              xlbd_blk_queue, BLKIF_MAX_SEGMENTS_PER_REQUEST);
>  
>          /* Make sure buffer addresses are sector-aligned. */
>          blk_queue_dma_alignment(xlbd_blk_queue, 511);
> +
> +	/* Set readahead */
> +	blk_queue_max_sectors(xlbd_blk_queue, 512);
This isn''t read-ahead, it''s the max request size setting. The
actual
read-ahead setting is in q->backing_dev_info.ra_pages.

There is a helper function for this type of stacking,
blk_queue_stack_limits(). You call it after setting up your own queue:

        blk_queue_stack_limits(my_queue, bottom_queue);

I''ll check the xen block driver to see if there''s anything
else that
sticks out.

-- 
Jens Axboe


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Kurt Garloff

2005-Mar-30 10:00 UTC

head link

Re: RE: RE: [Xen-devel] poor domU VBD performance.

On Wed, Mar 30, 2005 at 12:45:03AM +0200, Kurt Garloff
wrote:> Please test attached patch.
Delete it, blk_queue_max_sectors() is called a bit above.
Adding printk()s now to see what''s going on there.

Regards,
-- 
Kurt Garloff, Director SUSE Labs, Novell Inc.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Ian Pratt

2005-Mar-30 11:16 UTC

head link

RE: RE: RE: [Xen-devel] poor domU VBD performance.

> I''ll check the xen block driver to see if there''s
anything
> else that sticks out.
>
> Jens Axboe
Jens, I''d really appreciate this.

The blkfront/blkback drivers have rather evolved over time, and I don''t
think any of the core team fully understand the block-layer differences
between 2.4 and 2.6. 

There''s also some junk left in there from when the backend was in Xen
itself back in the days of 1.2, though Vincent has prepared a patch to
clean this up and also make ''refreshing'' of vbd''s
work (for size
changes), and also allow the blkfront driver to import whole disks
rather than paritions. We had this functionality on 2.4, but lost it in
the move to 2.6.

My bet is that it''s the 2.6 backend that is where the true perofrmance
bug lies. Using a 2.6 domU blkfront talking to a 2.4 dom0 blkback seems
to give good performance under a wide variety of circumstances. Using a
2.6 dom0 is far more pernickety. I agree with Andrew that I suspect
it''s
the work queue changes are biting us when we don''t have many
outstanding
requests.

Thanks,
Ian


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

peter bier

2005-Mar-30 17:01 UTC

head link

[Xen-devel] Re: poor domU VBD performance.

Ian Pratt <m+Ian.Pratt <at> cl.cam.ac.uk> writes:
> 
> > I''ll check the xen block driver to see if there''s
anything
> > else that sticks out.
> >
> > Jens Axboe
> 
> Jens, I''d really appreciate this.
> 
> The blkfront/blkback drivers have rather evolved over time, and I
don''t
> think any of the core team fully understand the block-layer differences
> between 2.4 and 2.6. 
> 
> There''s also some junk left in there from when the backend was in
Xen
> itself back in the days of 1.2, though Vincent has prepared a patch to
> clean this up and also make ''refreshing'' of
vbd''s work (for size
> changes), and also allow the blkfront driver to import whole disks
> rather than paritions. We had this functionality on 2.4, but lost it in
> the move to 2.6.
> 
> My bet is that it''s the 2.6 backend that is where the true
perofrmance
> bug lies. Using a 2.6 domU blkfront talking to a 2.4 dom0 blkback seems
> to give good performance under a wide variety of circumstances. Using a
> 2.6 dom0 is far more pernickety. I agree with Andrew that I suspect
it''s
> the work queue changes are biting us when we don''t have many
outstanding
> requests.
> 
> Thanks,
> Ian
> 

I have done my simple dd on hde1 with two different setting of readahead:
256 sectors and 512 sectors.

These are the results:

DOM0 readahead 512s

Device:    rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s    rkB/s    wkB/s avgrq-
sz avgqu-sz   await  svctm  %util
hde        115055.40   2.00 592.40  0.80 115647.80   22.40 57823.90    11.20   
194.99     2.30    3.88   1.68  99.80
hda          0.00   0.00  0.00  0.00    0.00    0.00     0.00     0.00     
0.00     0.00    0.00   0.00   0.00

avg-cpu:  %user   %nice %system %iowait   %idle
           0.20    0.00   31.60   14.20   54.00

 DOMU  readahead 512s

Device:    rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s    rkB/s    wkB/s avgrq-
sz avgqu-sz   await  svctm  %util
hda1         0.00   0.20  0.00  0.00    0.00    3.20     0.00     1.60     
0.00     0.00    0.00   0.00   0.00
hde1       102301.40   0.00 11571.00  0.00 113868.80    0.00 56934.40     
0.00     9.84    68.45    5.92   0.09 100.00

avg-cpu:  %user   %nice %system %iowait   %idle
           0.00    0.00   35.00   65.00    0.00

DOM0 readahead 256s

Device:    rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s    rkB/s    wkB/s avgrq-
sz avgqu-sz   await  svctm  %util
hde        28289.20   1.80 126.80  0.40 28416.00   17.60 14208.00     8.80   
223.53     1.06    8.32   7.85  99.80
hda          0.00   0.00  0.00  0.00    0.00    0.00     0.00     0.00     
0.00     0.00    0.00   0.00   0.00

avg-cpu:  %user   %nice %system %iowait   %idle
           0.20    0.00    1.60    5.60   92.60

DOMU readahead 256s

Device:    rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s    rkB/s    wkB/s avgrq-
sz avgqu-sz   await  svctm  %util
hda1         0.00   0.20  0.00  0.40    0.00    4.80     0.00     2.40    
12.00     0.00    0.00   0.00   0.00
hde1       25085.60   0.00 3330.40  0.00 28416.00    0.00 14208.00     
0.00     8.53    30.54    9.17   0.30 100.00

avg-cpu:  %user   %nice %system %iowait   %idle
           0.20    0.00    1.40   98.40    0.00

What surprises me is that the service time for the request in DOM0 decreases
dramatically when readahead is increased from 256 to 512 sectors. If the output
of iostat is reliable, it tells me requests in DOMU are assembled to about 8  
to 10 sectors in size, while DOM0 puts them together to about 200 or even more
sectors 
Using readahead of 256 sectors results in a an average queuesize of anout 1
while changing readahead to 512 sectors results in an avaerage queuesize of 
slightly above 2 on DOM0. Service times in DOM0 and readahead 256 sectors 
seem to be in the range of the typical seek time of a modern ide disk while 
it is significantly lower with readahead of 512 sectors. 
As I have mentioned, this is the system with only one installed disk; this re-
sults in the write activity on the disk. The two write request per second
go into a different partition and those result in four required seeks per 
second. This should not be a reason for all requests to take about seek time
as service time. 

I have done a number of further test on various systems. In most cases I failed
to achieve service times below 8 msecs in Dom0; the only counterexample is 
reported above. It seems to me, that at low readahead values the amount of
data requested for from disk is simply the readahead amount of data. This 
request takes about seek time and thus I get lower performance when I work
with small readahead values.
What I do not understand at all is why throughput collapses with large 
readahead 
sizes. 

I found in mm/readahead.c that the readahead size for a file is updated if 
the readahead is not efficient. I suspect that the mechanism might lead to 
readahed being switched of for this file.
With readahead being set to 2048 sectors, the product of avgq-sz and avgrq-sz
reported by drops to 4 to 5 physical pages.

Peter 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Andrew Theurer

2005-Mar-30 18:05 UTC

head link

Re: [Xen-devel] Re: poor domU VBD performance.

peter bier wrote:
>Ian Pratt <m+Ian.Pratt <at> cl.cam.ac.uk> writes:
>
>  
>
>>>I''ll check the xen block driver to see if there''s
anything
>>>else that sticks out.
>>>
>>>Jens Axboe
>>>      
>>>
>>Jens, I''d really appreciate this.
>>
>>The blkfront/blkback drivers have rather evolved over time, and I
don''t
>>think any of the core team fully understand the block-layer differences
>>between 2.4 and 2.6. 
>>
>>There''s also some junk left in there from when the backend was
in Xen
>>itself back in the days of 1.2, though Vincent has prepared a patch to
>>clean this up and also make ''refreshing'' of
vbd''s work (for size
>>changes), and also allow the blkfront driver to import whole disks
>>rather than paritions. We had this functionality on 2.4, but lost it in
>>the move to 2.6.
>>
>>My bet is that it''s the 2.6 backend that is where the true
perofrmance
>>bug lies. Using a 2.6 domU blkfront talking to a 2.4 dom0 blkback seems
>>to give good performance under a wide variety of circumstances. Using a
>>2.6 dom0 is far more pernickety. I agree with Andrew that I suspect
it''s
>>the work queue changes are biting us when we don''t have many
outstanding
>>requests.
>>
>>Thanks,
>>Ian
>>
>>    
>>
>
>
>I have done my simple dd on hde1 with two different setting of readahead:
>256 sectors and 512 sectors.
>I added a counter and incremented every time blkback daemon was woken up 
and ran the read test in domU.  With 32k and 320k request sizes 
(o_direct), I consistently got 200 wake ups/second.  I expected 
100/second, the same interval as the minimum svc cmt times I am seeing, 
but anyway, 200/sec is way to low for small request sizes.  I think this 
confirms the latency issue.  Not sure yet why it cannot wake up more 
frequently.

-Andrew


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jens Axboe

2005-Mar-31 07:05 UTC

head link

Re: RE: RE: [Xen-devel] poor domU VBD performance.

On Wed, Mar 30 2005, Ian Pratt wrote:> > I''ll check the xen block driver to see if there''s
anything
> > else that sticks out.
> >
> > Jens Axboe
> 
> Jens, I''d really appreciate this.
> 
> The blkfront/blkback drivers have rather evolved over time, and I
don''t
> think any of the core team fully understand the block-layer differences
> between 2.4 and 2.6. 
> 
> There''s also some junk left in there from when the backend was in
Xen
> itself back in the days of 1.2, though Vincent has prepared a patch to
> clean this up and also make ''refreshing'' of
vbd''s work (for size
> changes), and also allow the blkfront driver to import whole disks
> rather than paritions. We had this functionality on 2.4, but lost it in
> the move to 2.6.
> 
> My bet is that it''s the 2.6 backend that is where the true
perofrmance
> bug lies. Using a 2.6 domU blkfront talking to a 2.4 dom0 blkback seems
> to give good performance under a wide variety of circumstances. Using a
> 2.6 dom0 is far more pernickety. I agree with Andrew that I suspect
it''s
> the work queue changes are biting us when we don''t have many
outstanding
> requests.
You never schedule the queues you submit the io against for the 2.6
kernel, you only have a tq_disk run for 2.4 kernels. This basically puts
you at the mercy of the timeout unplugging, which is really suboptimal
unless you can keep the io queue of the target busy at all times.

You need to either mark the last bio going to that device as BIO_SYNC,
or do a blk_run_queue() on the target queue after having submitted all
io in this batch for it.

-- 
Jens Axboe


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jens Axboe

2005-Mar-31 07:10 UTC

head link

Re: RE: RE: [Xen-devel] poor domU VBD performance.

On Thu, Mar 31 2005, Jens Axboe wrote:> On Wed, Mar 30 2005, Ian Pratt wrote:
> > > I''ll check the xen block driver to see if
there''s anything
> > > else that sticks out.
> > >
> > > Jens Axboe
> > 
> > Jens, I''d really appreciate this.
> > 
> > The blkfront/blkback drivers have rather evolved over time, and I
don''t
> > think any of the core team fully understand the block-layer
differences
> > between 2.4 and 2.6. 
> > 
> > There''s also some junk left in there from when the backend
was in Xen
> > itself back in the days of 1.2, though Vincent has prepared a patch to
> > clean this up and also make ''refreshing'' of
vbd''s work (for size
> > changes), and also allow the blkfront driver to import whole disks
> > rather than paritions. We had this functionality on 2.4, but lost it
in
> > the move to 2.6.
> > 
> > My bet is that it''s the 2.6 backend that is where the true
perofrmance
> > bug lies. Using a 2.6 domU blkfront talking to a 2.4 dom0 blkback
seems
> > to give good performance under a wide variety of circumstances. Using
a
> > 2.6 dom0 is far more pernickety. I agree with Andrew that I suspect
it''s
> > the work queue changes are biting us when we don''t have many
outstanding
> > requests.
> 
> You never schedule the queues you submit the io against for the 2.6
> kernel, you only have a tq_disk run for 2.4 kernels. This basically puts
> you at the mercy of the timeout unplugging, which is really suboptimal
> unless you can keep the io queue of the target busy at all times.
> 
> You need to either mark the last bio going to that device as BIO_SYNC,
> or do a blk_run_queue() on the target queue after having submitted all
> io in this batch for it.
Here is a temporary work-around, this should bring you close to 100%
performance at the cost of some extra unplugs. Uncompiled.

--- blkback.c~	2005-03-31 09:06:16.000000000 +0200
+++ blkback.c	2005-03-31 09:09:27.000000000 +0200
@@ -481,7 +481,6 @@
     for ( i = 0; i < nr_psegs; i++ )
     {
         struct bio *bio;
-        struct bio_vec *bv;
 
         bio = bio_alloc(GFP_ATOMIC, 1);
         if ( unlikely(bio == NULL) )
@@ -494,17 +493,12 @@
         bio->bi_private = pending_req;
         bio->bi_end_io  = end_block_io_op;
         bio->bi_sector  = phys_seg[i].sector_number;
-        bio->bi_rw      = operation;
 
-        bv = bio_iovec_idx(bio, 0);
-        bv->bv_page   = virt_to_page(MMAP_VADDR(pending_idx, i));
-        bv->bv_len    = phys_seg[i].nr_sects << 9;
-        bv->bv_offset = phys_seg[i].buffer & ~PAGE_MASK;
+	bio_add_page(bio, virt_to_page(MMAP_VADDR(pending_idx, i)),
+			phys_seg[i].nr_sects << 9,
+			phys_seg[i].buffer & ~PAGE_MASK);
 
-        bio->bi_size    = bv->bv_len;
-        bio->bi_vcnt++;
-
-        submit_bio(operation, bio);
+        submit_bio(operation | (1 << BIO_RW_SYNC), bio);
     }
 #endif
 

-- 
Jens Axboe


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2005-Mar-31 08:17 UTC

head link

Re: [Xen-devel] poor domU VBD performance.

On 31 Mar 2005, at 08:10, Jens Axboe wrote:
> Here is a temporary work-around, this should bring you close to 100%
> performance at the cost of some extra unplugs. Uncompiled.
Yep, this does the job for me. Thanks! Avoiding the extra unplugs is 
harder than it sounds as each request in a batch may go to a different 
request queue. To minimise the number of unplugs per batch we''d need to
add code to remember which queues we had used in the current batch, 
then kick them at the end of the batch. Is there likely to be any 
measurable benefit from reducing the number of unplugs?

  -- Keir


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jens Axboe

2005-Mar-31 08:19 UTC

head link

Re: [Xen-devel] poor domU VBD performance.

On Thu, Mar 31 2005, Keir Fraser wrote:> 
> On 31 Mar 2005, at 08:10, Jens Axboe wrote:
> 
> >Here is a temporary work-around, this should bring you close to 100%
> >performance at the cost of some extra unplugs. Uncompiled.
> 
> Yep, this does the job for me. Thanks! Avoiding the extra unplugs is 
> harder than it sounds as each request in a batch may go to a different 
> request queue. To minimise the number of unplugs per batch we''d
need to
> add code to remember which queues we had used in the current batch, 
> then kick them at the end of the batch. Is there likely to be any 
Or just keep track of the previous queue, if that has changed unplug the
previous queue and update previous queue variable.
> measurable benefit from reducing the number of unplugs?
Probably not, since the plugging happened at the front end as well. So
you should get a nice stream of io in any way.

-- 
Jens Axboe


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Philip R Auld

2005-Mar-31 14:33 UTC

head link

Re: [Xen-devel] poor domU VBD performance.

Rumor has it that on Thu, Mar 31, 2005 at 10:19:01AM +0200 Jens Axboe
said:> On Thu, Mar 31 2005, Keir Fraser wrote:
> 
> > measurable benefit from reducing the number of unplugs?
> 
> Probably not, since the plugging happened at the front end as well. So
> you should get a nice stream of io in any way.
This effects merging though, right? I don''t think the the front
end has done any merging. 

Also the BIO_RW_SYNC bit is sometimes ignored in __make_request
due to the bad queue locking interactions with scsi_request_fn.

The bio can be completed before the bio_sync() test in 
__make_request. Since there is no other reference to the bio it 
can be freed and reused by the time it is tested for BIO_RW_SYNC.

Cheers,

Phil

> 
> -- 
> Jens Axboe
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
-- 
Philip R. Auld, Ph.D.  	        	       Egenera, Inc.    
Software Architect                            165 Forest St.
(508) 858-2628                            Marlboro, MA 01752

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Kurt Garloff

2005-Mar-31 15:34 UTC

head link

Re: [Xen-devel] poor domU VBD performance.

Hi,

On Thu, Mar 31, 2005 at 09:33:12AM -0500, Philip R Auld
wrote:> This effects merging though, right? I don''t think the the front
> end has done any merging. 
The noop elevator does front and back merging.
My understanding is that it''s used in the frontend driver.

Otherwise, unplugging on every block would indeed be quite bad ...

Regards,
-- 
Kurt Garloff, Director SUSE Labs, Novell Inc.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jens Axboe

2005-Mar-31 15:39 UTC

head link

Re: [Xen-devel] poor domU VBD performance.

On Thu, Mar 31 2005, Kurt Garloff wrote:> Hi,
> 
> On Thu, Mar 31, 2005 at 09:33:12AM -0500, Philip R Auld wrote:
> > This effects merging though, right? I don''t think the the
front
> > end has done any merging. 
> 
> The noop elevator does front and back merging.
> My understanding is that it''s used in the frontend driver.
> 
> Otherwise, unplugging on every block would indeed be quite bad ...
Not necessarily - either your io rate is not fast enough to sustain a
substantial queue depth, in that case you get plugging on basically
every io anyways. If on the other hand the io rate is high enough to
maintain a queue depth of > 1, then the plugging will never take place
because the queue never empties.

So all in all, I don''t think the temporary work-around will be such a
bad idea. I would still rather implement the queue tracking though, it
should not be more than a few lines of code.

And Philip, I will get the bio_sync() change merged :-)

-- 
Jens Axboe


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jens Axboe

2005-Mar-31 15:41 UTC

head link

Re: [Xen-devel] poor domU VBD performance.

On Thu, Mar 31 2005, Jens Axboe wrote:> On Thu, Mar 31 2005, Kurt Garloff wrote:
> > Hi,
> > 
> > On Thu, Mar 31, 2005 at 09:33:12AM -0500, Philip R Auld wrote:
> > > This effects merging though, right? I don''t think the
the front
> > > end has done any merging. 
> > 
> > The noop elevator does front and back merging.
> > My understanding is that it''s used in the frontend driver.
> > 
> > Otherwise, unplugging on every block would indeed be quite bad ...
> 
> Not necessarily - either your io rate is not fast enough to sustain a
> substantial queue depth, in that case you get plugging on basically
> every io anyways. If on the other hand the io rate is high enough to
> maintain a queue depth of > 1, then the plugging will never take place
> because the queue never empties.
> 
> So all in all, I don''t think the temporary work-around will be
such a
> bad idea. I would still rather implement the queue tracking though, it
> should not be more than a few lines of code.
There are still cases where it will be suboptimal of course, I didn''t
intend to claim it will always be as fast as queue tracking! If you are
unlucky enough that the first request will reach the target device and
get started before the next one, you will have a small and a large part
of any given request executed. This isn''t good for performance,
naturally. But queueing is so fast, I would be surprised if this
happened much in the real world.

-- 
Jens Axboe


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2005-Mar-31 15:49 UTC

head link

Re: [Xen-devel] poor domU VBD performance.

On 31 Mar 2005, at 16:39, Jens Axboe wrote:
> Not necessarily - either your io rate is not fast enough to sustain a
> substantial queue depth, in that case you get plugging on basically
> every io anyways. If on the other hand the io rate is high enough to
> maintain a queue depth of > 1, then the plugging will never take place
> because the queue never empties.
>
> So all in all, I don''t think the temporary work-around will be
such a
> bad idea. I would still rather implement the queue tracking though, it
> should not be more than a few lines of code.
I''ve checked in something along the lines of what you described into 
both the 2.0-testing and the unstable trees. Looks to have identical 
performance to the original simple patch, at least for a bulk
''dd''.

  -- Keir




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Andrew Theurer

2005-Mar-31 16:02 UTC

head link

Re: [Xen-devel] poor domU VBD performance.

Keir Fraser wrote:
>
> On 31 Mar 2005, at 16:39, Jens Axboe wrote:
>
>> Not necessarily - either your io rate is not fast enough to sustain a
>> substantial queue depth, in that case you get plugging on basically
>> every io anyways. If on the other hand the io rate is high enough to
>> maintain a queue depth of > 1, then the plugging will never take
place
>> because the queue never empties.
>>
>> So all in all, I don''t think the temporary work-around will be
such a
>> bad idea. I would still rather implement the queue tracking though, it
>> should not be more than a few lines of code.
>
>
> I''ve checked in something along the lines of what you described
into
> both the 2.0-testing and the unstable trees. Looks to have identical 
> performance to the original simple patch, at least for a bulk
''dd''.
I''ll do a pull of unstable and see what I get with o_direct, thanks.

-Andrew


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Nivedita Singhvi

2005-Mar-31 16:27 UTC

head link

Re: [Xen-devel] poor domU VBD performance.

Jens Axboe wrote:
> There are still cases where it will be suboptimal of course, I
didn''t
> intend to claim it will always be as fast as queue tracking! If you are
> unlucky enough that the first request will reach the target device and
> get started before the next one, you will have a small and a large part
> of any given request executed. This isn''t good for performance,
> naturally. But queueing is so fast, I would be surprised if this
> happened much in the real world.
Although the usual answer for what scheduling algorithm is
best is almost always "depends on the workload", it was
suggested to me that the cfq was still the best option to
go with. What do people feel about that? (Or is AS going
to remain default?).

Also, we''re making the assumption here that guest OS = virtual
driver/device. I would rather we not make that assumption
always. This may be moot because I was also told there might
be a patch floating around (-mm ?) that allows you to
select scheduling algorithm on a per-device basis. Anyone
know if this is going to come in anytime soon?

thanks,
Nivedita




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Philip R Auld

2005-Mar-31 16:53 UTC

head link

Re: [Xen-devel] poor domU VBD performance.

Rumor has it that on Thu, Mar 31, 2005 at 05:34:49PM +0200 Kurt Garloff
said:> Hi,
> 
> On Thu, Mar 31, 2005 at 09:33:12AM -0500, Philip R Auld wrote:
> > This effects merging though, right? I don''t think the the
front
> > end has done any merging. 
> 
> The noop elevator does front and back merging.
> My understanding is that it''s used in the frontend driver.
If that is the case, it can only merge things that are 
machine contiguous. Current guests know this mapping, but 
can they get this when running unmodified with VT-x.

My experience showed very little if any multipage 
IO coming out of the front end.
> 
> Otherwise, unplugging on every block would indeed be quite bad ...
Seems to be somewhat moot anyway given the curent change planned :)

Cheers,

Phil> 
> Regards,
> -- 
> Kurt Garloff, Director SUSE Labs, Novell Inc.


-- 
Philip R. Auld, Ph.D.  	        	       Egenera, Inc.    
Software Architect                            165 Forest St.
(508) 858-2628                            Marlboro, MA 01752

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Philip R Auld

2005-Mar-31 16:55 UTC

head link

Re: [Xen-devel] poor domU VBD performance.

Rumor has it that on Thu, Mar 31, 2005 at 05:39:26PM +0200 Jens Axboe
said:> 
> And Philip, I will get the bio_sync() change merged :-)

Thanks! It''s good to be transparent ;)



Phil
> 
> -- 
> Jens Axboe
-- 
Philip R. Auld, Ph.D.  	        	       Egenera, Inc.    
Software Architect                            165 Forest St.
(508) 858-2628                            Marlboro, MA 01752

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jens Axboe

2005-Mar-31 17:43 UTC

head link

Re: [Xen-devel] poor domU VBD performance.

On Thu, Mar 31 2005, Nivedita Singhvi wrote:> Jens Axboe wrote:
> 
> >There are still cases where it will be suboptimal of course, I
didn''t
> >intend to claim it will always be as fast as queue tracking! If you are
> >unlucky enough that the first request will reach the target device and
> >get started before the next one, you will have a small and a large part
> >of any given request executed. This isn''t good for
performance,
> >naturally. But queueing is so fast, I would be surprised if this
> >happened much in the real world.
> 
> Although the usual answer for what scheduling algorithm is
> best is almost always "depends on the workload", it was
> suggested to me that the cfq was still the best option to
> go with. What do people feel about that? (Or is AS going
> to remain default?).
Really the only one that you should not use is AS, anything else will be
fine. AS should only ever be used at the bottom of the stack, if on a
single spindle backing. CFQ will be fine, as will deadline and noop.
> Also, we''re making the assumption here that guest OS = virtual
> driver/device. I would rather we not make that assumption
> always. This may be moot because I was also told there might
> be a patch floating around (-mm ?) that allows you to
> select scheduling algorithm on a per-device basis. Anyone
> know if this is going to come in anytime soon?
That patch is in mainline since 2.6.10. You can change schedulers by
echoing the preferred scheduler to /sys/block/<device>/queue/scheduler -
reading that file will show you what schedulers are available.

-- 
Jens Axboe


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jens Axboe

2005-Mar-31 17:44 UTC

head link

Re: [Xen-devel] poor domU VBD performance.

On Thu, Mar 31 2005, Keir Fraser wrote:> 
> On 31 Mar 2005, at 16:39, Jens Axboe wrote:
> 
> >Not necessarily - either your io rate is not fast enough to sustain a
> >substantial queue depth, in that case you get plugging on basically
> >every io anyways. If on the other hand the io rate is high enough to
> >maintain a queue depth of > 1, then the plugging will never take
place
> >because the queue never empties.
> >
> >So all in all, I don''t think the temporary work-around will be
such a
> >bad idea. I would still rather implement the queue tracking though, it
> >should not be more than a few lines of code.
> 
> I''ve checked in something along the lines of what you described
into
> both the 2.0-testing and the unstable trees. Looks to have identical 
> performance to the original simple patch, at least for a bulk
''dd''.
Can you post the patch here for review? Or just point me somewhere I can
view it.

-- 
Jens Axboe


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Ian Pratt

2005-Mar-31 17:55 UTC

head link

RE: [Xen-devel] poor domU VBD performance.

> > I''ve checked in something along the lines of what you 
> described into 
> > both the 2.0-testing and the unstable trees. Looks to have 
> identical 
> > performance to the original simple patch, at least for a bulk
''dd''.
> 
> Can you post the patch here for review? Or just point me 
> somewhere I can view it.
Jens,

Thanks for your help on this.

Here''s Keirs updated patch:
http://xen.bkbits.net:8080/xen-2.0-testing.bk/gnupatch@424c1abd7LgWMiask
LEEAAX7ffdkXQ

Which is based on this earlier patch from you:
http://xen.bkbits.net:8080/xen-2.0-testing.bk/gnupatch@424bba4091aV1FuNk
sY_4w_z4Tvr3g


Best,
Ian

diff -Naru a/linux-2.6.11-xen-sparse/drivers/xen/blkback/blkback.c
b/linux-2.6.11-xen-sparse/drivers/xen/blkback/blkback.c
--- a/linux-2.6.11-xen-sparse/drivers/xen/blkback/blkback.c2005-03-31
09:52:27 -08:00
+++ b/linux-2.6.11-xen-sparse/drivers/xen/blkback/blkback.c2005-03-31
09:52:27 -08:00
@@ -481,7 +481,6 @@
     for ( i = 0; i < nr_psegs; i++ )
     {
         struct bio *bio;
-        struct bio_vec *bv;
 
         bio = bio_alloc(GFP_ATOMIC, 1);
         if ( unlikely(bio == NULL) )
@@ -494,17 +493,14 @@
         bio->bi_private = pending_req;
         bio->bi_end_io  = end_block_io_op;
         bio->bi_sector  = phys_seg[i].sector_number;
-        bio->bi_rw      = operation;
 
-        bv = bio_iovec_idx(bio, 0);
-        bv->bv_page   = virt_to_page(MMAP_VADDR(pending_idx, i));
-        bv->bv_len    = phys_seg[i].nr_sects << 9;
-        bv->bv_offset = phys_seg[i].buffer & ~PAGE_MASK;
+        bio_add_page(
+            bio,
+            virt_to_page(MMAP_VADDR(pending_idx, i)),
+            phys_seg[i].nr_sects << 9,
+            phys_seg[i].buffer & ~PAGE_MASK);
 
-        bio->bi_size    = bv->bv_len;
-        bio->bi_vcnt++;
-
-        submit_bio(operation, bio);
+        submit_bio(operation | (1 << BIO_RW_SYNC), bio);
     }
 #endif
 
# This is a BitKeeper generated diff -Nru style patch.
#
# ChangeSet
#   2005/03/31 09:52:16+01:00 kaf24@firebug.cl.cam.ac.uk 
#   Backport of Jens blkdev performance patch. I accidentally applied it
#   first to unstable.
# 
# linux-2.6.11-xen-sparse/drivers/xen/blkback/blkback.c
#   2005/03/31 09:52:15+01:00 kaf24@firebug.cl.cam.ac.uk +6 -10
#   Backport of Jens blkdev performance patch. I accidentally applied it
#   first to unstable.
# 

diff -Naru a/linux-2.6.11-xen-sparse/drivers/xen/blkback/blkback.c
b/linux-2.6.11-xen-sparse/drivers/xen/blkback/blkback.c
--- a/linux-2.6.11-xen-sparse/drivers/xen/blkback/blkback.c2005-03-31
09:54:46 -08:00
+++ b/linux-2.6.11-xen-sparse/drivers/xen/blkback/blkback.c2005-03-31
09:54:46 -08:00
@@ -66,6 +66,19 @@
 
 #if LINUX_VERSION_CODE < KERNEL_VERSION(2,6,0)
 static kmem_cache_t *buffer_head_cachep;
+#else
+static request_queue_t *plugged_queue;
+void bdev_put(struct block_device *bdev)
+{
+    request_queue_t *q = plugged_queue;
+    /* We might be giving up last reference to plugged queue. Flush if
so. */
+    if ( (q != NULL) &&
+         (q == bdev_get_queue(bdev)) && 
+         (cmpxchg(&plugged_queue, q, NULL) == q) )
+        blk_run_queue(q);
+    /* It''s now safe to drop the block device. */
+    blkdev_put(bdev);
+}
 #endif
 
 static int do_block_io_op(blkif_t *blkif, int max_to_do);
@@ -176,9 +189,15 @@
             blkif_put(blkif);
         }
 
-#if LINUX_VERSION_CODE < KERNEL_VERSION(2,6,0)
         /* Push the batch through to disc. */
+#if LINUX_VERSION_CODE < KERNEL_VERSION(2,6,0)
         run_task_queue(&tq_disk);
+#else
+        if ( plugged_queue != NULL )
+        {
+            blk_run_queue(plugged_queue);
+            plugged_queue = NULL;
+        }
 #endif
     }
 }
@@ -481,6 +500,7 @@
     for ( i = 0; i < nr_psegs; i++ )
     {
         struct bio *bio;
+        request_queue_t *q;
 
         bio = bio_alloc(GFP_ATOMIC, 1);
         if ( unlikely(bio == NULL) )
@@ -500,7 +520,14 @@
             phys_seg[i].nr_sects << 9,
             phys_seg[i].buffer & ~PAGE_MASK);
 
-        submit_bio(operation | (1 << BIO_RW_SYNC), bio);
+        if ( (q = bdev_get_queue(bio->bi_bdev)) != plugged_queue )
+        {
+            if ( plugged_queue != NULL )
+                blk_run_queue(plugged_queue);
+            plugged_queue = q;
+        }
+
+        submit_bio(operation, bio);
     }
 #endif
 
diff -Naru a/linux-2.6.11-xen-sparse/drivers/xen/blkback/common.h
b/linux-2.6.11-xen-sparse/drivers/xen/blkback/common.h
--- a/linux-2.6.11-xen-sparse/drivers/xen/blkback/common.h2005-03-31
09:54:46 -08:00
+++ b/linux-2.6.11-xen-sparse/drivers/xen/blkback/common.h2005-03-31
09:54:46 -08:00
@@ -30,8 +30,10 @@
 #if LINUX_VERSION_CODE >= KERNEL_VERSION(2,6,0)
 typedef struct rb_root rb_root_t;
 typedef struct rb_node rb_node_t;
+extern void bdev_put(struct block_device *bdev);
 #else
 struct block_device;
+#define bdev_put(_b) ((void)0)
 #endif
 
 typedef struct blkif_st {
diff -Naru a/linux-2.6.11-xen-sparse/drivers/xen/blkback/vbd.c
b/linux-2.6.11-xen-sparse/drivers/xen/blkback/vbd.c
--- a/linux-2.6.11-xen-sparse/drivers/xen/blkback/vbd.c2005-03-31
09:54:46 -08:00
+++ b/linux-2.6.11-xen-sparse/drivers/xen/blkback/vbd.c2005-03-31
09:54:46 -08:00
@@ -150,7 +150,7 @@
     {
         DPRINTK("vbd_grow: device %08x doesn''t exist.\n",
x->extent.device);
         grow->status = BLKIF_BE_STATUS_EXTENT_NOT_FOUND;
-        blkdev_put(x->bdev);
+        bdev_put(x->bdev);
         goto out;
     }
 
@@ -255,7 +255,7 @@
     *px = x->next; /* ATOMIC: no need for vbd_lock. */
 
 #if LINUX_VERSION_CODE >= KERNEL_VERSION(2,6,0)
-    blkdev_put(x->bdev);
+    bdev_put(x->bdev);
 #endif
     kfree(x);
 
@@ -307,7 +307,7 @@
     {
         t = x->next;
 #if LINUX_VERSION_CODE >= KERNEL_VERSION(2,6,0)
-        blkdev_put(x->bdev);
+        bdev_put(x->bdev);
 #endif
         kfree(x);
         x = t;
@@ -335,7 +335,7 @@
         {
             t = x->next;
 #if LINUX_VERSION_CODE >= KERNEL_VERSION(2,6,0)
-            blkdev_put(x->bdev);
+            bdev_put(x->bdev);
 #endif
             kfree(x);
             x = t;
# This is a BitKeeper generated diff -Nru style patch.
#
# ChangeSet
#   2005/03/31 16:43:57+01:00 kaf24@firebug.cl.cam.ac.uk 
#   Backport of batched request_queue unplugging in blkback driver.
#   Signed-off-by: Keir Fraser <keir@xensource.com>
# 
# linux-2.6.11-xen-sparse/drivers/xen/blkback/blkback.c
#   2005/03/31 16:43:56+01:00 kaf24@firebug.cl.cam.ac.uk +29 -2
#   Backport of batched request_queue unplugging in blkback driver.
#   Signed-off-by: Keir Fraser <keir@xensource.com>
# 
# linux-2.6.11-xen-sparse/drivers/xen/blkback/common.h
#   2005/03/31 16:43:56+01:00 kaf24@firebug.cl.cam.ac.uk +2 -0
#   Backport of batched request_queue unplugging in blkback driver.
#   Signed-off-by: Keir Fraser <keir@xensource.com>
# 
# linux-2.6.11-xen-sparse/drivers/xen/blkback/vbd.c
#   2005/03/31 16:43:56+01:00 kaf24@firebug.cl.cam.ac.uk +4 -4
#   Backport of batched request_queue unplugging in blkback driver.
#   Signed-off-by: Keir Fraser <keir@xensource.com>
# 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jens Axboe

2005-Mar-31 18:01 UTC

head link

Re: [Xen-devel] poor domU VBD performance.

On Thu, Mar 31 2005, Philip R Auld wrote:> > > This effects merging though, right? I don''t think the
the front
> > > end has done any merging. 
> > 
> > The noop elevator does front and back merging.
> > My understanding is that it''s used in the frontend driver.
> 
> If that is the case, it can only merge things that are 
> machine contiguous. Current guests know this mapping, but 
> can they get this when running unmodified with VT-x.
> 
> My experience showed very little if any multipage 
> IO coming out of the front end.
There aren''t that many users of multipage ios yet. direct io will use
it, ext2 will as well. iirc, -mm has patches for ext3 too. so it''s
definitely improving :-)

-- 
Jens Axboe


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jens Axboe

2005-Mar-31 18:04 UTC

head link

Re: [Xen-devel] poor domU VBD performance.

On Thu, Mar 31 2005, Ian Pratt wrote:> 
> > > I''ve checked in something along the lines of what you 
> > described into 
> > > both the 2.0-testing and the unstable trees. Looks to have 
> > identical 
> > > performance to the original simple patch, at least for a bulk
''dd''.
> > 
> > Can you post the patch here for review? Or just point me 
> > somewhere I can view it.
> 
> Jens,
> 
> Thanks for your help on this.
> 
> Here''s Keirs updated patch:
> http://xen.bkbits.net:8080/xen-2.0-testing.bk/gnupatch@424c1abd7LgWMiask
> LEEAAX7ffdkXQ
> 
> Which is based on this earlier patch from you:
> http://xen.bkbits.net:8080/xen-2.0-testing.bk/gnupatch@424bba4091aV1FuNk
> sY_4w_z4Tvr3g
I cannot immediately see if you call bdev_put() right after queueing the
io? If so, I think the patch looks fine. If not, you are missing the
last unplug :-)

-- 
Jens Axboe


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Kurt Garloff

2005-Mar-31 18:27 UTC

head link

Re: [Xen-devel] poor domU VBD performance.

Hi Niv,

On Thu, Mar 31, 2005 at 08:27:30AM -0800, Nivedita Singhvi
wrote:> Although the usual answer for what scheduling algorithm is
> best is almost always "depends on the workload", it was
> suggested to me that the cfq was still the best option to
> go with. What do people feel about that? (Or is AS going
> to remain default?).
This is a different dicussion.
But, yes, I would agree that CFQ (v3) is the best default choice.

Jens, should we maybe make sure that the blockback driver does use 
different (fake) UIDs for the domains that it serves to provide 
the fairness between them. Next step would be to allow to tweak 
IO priorities. Or, to make it more general, add a parameter (call
it uid), that a block driver can pass down to the IO scheduler
and that would normally be current->uid but may be set differently?
> Also, we''re making the assumption here that guest OS = virtual
> driver/device. I would rather we not make that assumption
> always. This may be moot because I was also told there might
> be a patch floating around (-mm ?) that allows you to
> select scheduling algorithm on a per-device basis. Anyone
It''s part of 2.6.11.
garloff@tpkurt:~ [0]$ cat /sys/block/hda/queue/scheduler
noop anticipatory deadline [cfq]

Regards,
-- 
Kurt Garloff, Director SUSE Labs, Novell Inc.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Philip R Auld

2005-Mar-31 18:43 UTC

head link

Re: [Xen-devel] poor domU VBD performance.

Rumor has it that on Thu, Mar 31, 2005 at 08:01:52PM +0200 Jens Axboe
said:> On Thu, Mar 31 2005, Philip R Auld wrote:
> > 
> > My experience showed very little if any multipage 
> > IO coming out of the front end.
> 
> There aren''t that many users of multipage ios yet. direct io will
use
> it, ext2 will as well. iirc, -mm has patches for ext3 too. so it''s
> definitely improving :-)
Sorry, I was being sloppy with terminology :)

What I was getting at was that the backend  will split requests
up and issue each physical segment as a separate bio  (at least in 
the 2.0.5 tree I have in front of me). And that none of these 
physical segments was more that 1 page. 

So the request merging in the back end OS is important, no?


Cheers,

Phil
> 
> -- 
> Jens Axboe
-- 
Philip R. Auld, Ph.D.  	        	       Egenera, Inc.    
Software Architect                            165 Forest St.
(508) 858-2628                            Marlboro, MA 01752

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2005-Mar-31 18:57 UTC

head link

Re: [Xen-devel] poor domU VBD performance.

On 31 Mar 2005, at 19:04, Jens Axboe wrote:
> I cannot immediately see if you call bdev_put() right after queueing 
> the
> io? If so, I think the patch looks fine. If not, you are missing the
> last unplug :-)
That''s not the job of bdev_put(): the final unplug is done at the end 
of blkio_schedule -- the same place that I do a run_task_queue() when 
compling for Linux 2.4.

  Cheers,
  Keir


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2005-Mar-31 19:07 UTC

head link

Re: [Xen-devel] poor domU VBD performance.

> What I was getting at was that the backend  will split requests
> up and issue each physical segment as a separate bio  (at least in
> the 2.0.5 tree I have in front of me). And that none of these
> physical segments was more that 1 page.
>
> So the request merging in the back end OS is important, no?
Ah, this reminds me I have one more question for Jens.

Since all the bio''s that I queue up in a single invocation of 
dispatch_rw_block_io() will actually be adjacent to each other (because 
they''re all from the same scatter-gather list) can I actually do 
something like (very roughly):

bio = bio_alloc(GFP_KERNEL, nr_psegs);
for ( i = 0; i < nr_psegs; i++ )
    bio_add_page(bio, blah...);
submit_bio(operation, bio);

Each of the biovecs that I queue may not be a full page in size (but 
won''t straddle a page boundary of course).

This would avoid the bio''s having to be merged again later.

  -- Keir


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2005-Mar-31 19:10 UTC

head link

Re: [Xen-devel] poor domU VBD performance.

On 31 Mar 2005, at 20:07, Keir Fraser wrote:
> Since all the bio''s that I queue up in a single invocation of 
> dispatch_rw_block_io() will actually be adjacent to each other 
> (because they''re all from the same scatter-gather list)
I should add: I know that the code makes it look like each s-g element 
might map somewhere entirely different from the previous one, but we no 
longer support that mode of operation. Each VBD now always maps onto a 
single, entire block device or partition.

  -- Keir


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jens Axboe

2005-Mar-31 19:20 UTC

head link

Re: [Xen-devel] poor domU VBD performance.

On Thu, Mar 31 2005, Keir Fraser wrote:> >What I was getting at was that the backend  will split requests
> >up and issue each physical segment as a separate bio  (at least in
> >the 2.0.5 tree I have in front of me). And that none of these
> >physical segments was more that 1 page.
> >
> >So the request merging in the back end OS is important, no?
> 
> Ah, this reminds me I have one more question for Jens.
> 
> Since all the bio''s that I queue up in a single invocation of 
> dispatch_rw_block_io() will actually be adjacent to each other (because 
> they''re all from the same scatter-gather list) can I actually do 
> something like (very roughly):
> 
> bio = bio_alloc(GFP_KERNEL, nr_psegs);
> for ( i = 0; i < nr_psegs; i++ )
>    bio_add_page(bio, blah...);
> submit_bio(operation, bio);
> 
> Each of the biovecs that I queue may not be a full page in size (but 
> won''t straddle a page boundary of course).
Yes, this is precisely what you should do, the current method is pretty
suboptimal. Basically allocate a bio with nr_psegs, and call
bio_add_page() for each page until it returns _less_ than the number of
bytes you requested. When it does that, submit that bio for io and
allocate a new bio with nr_psegs-submitted_segs bio_vecs attached.
Continue until you are done.

-- 
Jens Axboe


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jens Axboe

2005-Mar-31 19:21 UTC

head link

Re: [Xen-devel] poor domU VBD performance.

On Thu, Mar 31 2005, Philip R Auld wrote:> Rumor has it that on Thu, Mar 31, 2005 at 08:01:52PM +0200 Jens Axboe said:
> > On Thu, Mar 31 2005, Philip R Auld wrote:
> > > 
> > > My experience showed very little if any multipage 
> > > IO coming out of the front end.
> > 
> > There aren''t that many users of multipage ios yet. direct io
will use
> > it, ext2 will as well. iirc, -mm has patches for ext3 too. so
it''s
> > definitely improving :-)
> 
> Sorry, I was being sloppy with terminology :)
> 
> What I was getting at was that the backend  will split requests
> up and issue each physical segment as a separate bio  (at least in 
> the 2.0.5 tree I have in front of me). And that none of these 
> physical segments was more that 1 page. 
> 
> So the request merging in the back end OS is important, no?
I suppose it always is, since the merge criteria may have changed from
when the io was initially queued. If requests are always split into
single pages, then it becomes very important to merge at the backend.

-- 
Jens Axboe


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jens Axboe

2005-Mar-31 19:22 UTC

head link

Re: [Xen-devel] poor domU VBD performance.

On Thu, Mar 31 2005, Keir Fraser wrote:> 
> On 31 Mar 2005, at 19:04, Jens Axboe wrote:
> 
> >I cannot immediately see if you call bdev_put() right after queueing 
> >the
> >io? If so, I think the patch looks fine. If not, you are missing the
> >last unplug :-)
> 
> That''s not the job of bdev_put(): the final unplug is done at the
end
> of blkio_schedule -- the same place that I do a run_task_queue() when 
> compling for Linux 2.4.
Thanks for confirming, that sounds fine.

-- 
Jens Axboe


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Andrew Theurer

2005-Mar-31 20:49 UTC

head link

Re: [Xen-devel] poor domU VBD performance.

On Thursday 31 March 2005 11:55, Ian Pratt wrote:> > > I''ve checked in something along the lines of what you
> >
> > described into
> >
> > > both the 2.0-testing and the unstable trees. Looks to have
> >
> > identical
> >
> > > performance to the original simple patch, at least for a bulk
''dd''.
> >
> > Can you post the patch here for review? Or just point me
> > somewhere I can view it.
>
> Jens,
>
> Thanks for your help on this.
BTW, I am now getting this with xen-unstable:

Process xenblkd (pid: 730, threadinfo=f7cc4000 task=f7c42510)
Stack: c022d172 f44b1a08 f363c6f0 f7cc4000 c046d40c c02849f8 f44b1a08 00000010
       00000000 f7c42510 c0115b0a 00000000 00000000 f7c42510 c17f1e48 c01092e6
       00000000 f7c42510 c0115b0a 00100100 00200200 00000000 00000000 00000000
Call Trace:
 [<c022d172>] blk_run_queue+0x38/0x91
 [<c02849f8>] blkio_schedule+0x126/0x149
 [<c0115b0a>] default_wake_function+0x0/0x12
 [<c01092e6>] ret_from_fork+0x6/0x1c
 [<c0115b0a>] default_wake_function+0x0/0x12
 [<c02848d2>] blkio_schedule+0x0/0x149
 [<c0107571>] kernel_thread_helper+0x5/0xb
Code:  Bad EIP value.
 <6>note: xenblkd[730] exited with preempt_count 1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2005-Mar-31 21:15 UTC

head link

Re: [Xen-devel] poor domU VBD performance.

On 31 Mar 2005, at 21:49, Andrew Theurer wrote:
> BTW, I am now getting this with xen-unstable:
>
> Process xenblkd (pid: 730, threadinfo=f7cc4000 task=f7c42510)
> Stack: c022d172 f44b1a08 f363c6f0 f7cc4000 c046d40c c02849f8 f44b1a08 
> 00000010
>        00000000 f7c42510 c0115b0a 00000000 00000000 f7c42510 c17f1e48 
> c01092e6
>        00000000 f7c42510 c0115b0a 00100100 00200200 00000000 00000000 
> 00000000
I wonder if blk_run_queue() is not the right thing to call. For 
example, it ignores whether the queue has been forcibly stopped by the 
underlying driver and doesn''t check whether there are any requests that
actually require pushing. Plus various drivers (swraid and probably 
lvm) have their own unplug function and blk_run_queue doesn''t handle 
that.

Could you try again, but replace calls to blk_run_queue(plugged_queue) 
in blkback.c with:
    if ( plugged_queue->unplug_fn )
           plugged_queue->unplug_fn(plugged_queue);

This looks like a better match with what various other drivers do (e.g. 
swraid).

  Thanks,
  Keir


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Andrew Theurer

2005-Mar-31 21:27 UTC

head link

Re: [Xen-devel] poor domU VBD performance.

> Could you try again, but replace calls to
> blk_run_queue(plugged_queue) in blkback.c with:
>     if ( plugged_queue->unplug_fn )
>            plugged_queue->unplug_fn(plugged_queue);
>
> This looks like a better match with what various other drivers do
> (e.g. swraid).
Sorry, It looks like did not get you the whole output, but I can try 
what you suggested.

Unable to handle kernel NULL pointer dereference at virtual address 
00000000
 printing eip:
00000000
*pde = ma 00000000 pa 55555000
 [<c02691f1>] blkio_schedule+0x126/0x14c
 [<c01147b5>] default_wake_function+0x0/0x12
 [<c01147b5>] default_wake_function+0x0/0x12
 [<c02690cb>] blkio_schedule+0x0/0x14c
 [<c01071f1>] kernel_thread_helper+0x5/0xb
Oops: 0000 [#1]
Modules linked in: ipt_MASQUERADE iptable_nat ip_conntrack ip_tables 
qla2300 qla2xxx scsi_transport_fc mptscsih mptbase
CPU:    0
EIP:    0061:[<00000000>]    Not tainted VLI
EFLAGS: 00010282   (2.6.11-xen0-up)
EIP is at 0x0
eax: 00000000   ebx: f5ec9b70   ecx: f3e13b64   edx: 00000000
esi: 00000000   edi: c044540c   ebp: f7d8dfc0   esp: f7d8df84
ds: 007b   es: 007b   ss: 0069
Process xenblkd (pid: 730, threadinfo=f7d8c000 task=f7d43020)
Stack: c0217f7d f5ec9b70 f3fee6f0 f7d8c000 c02691f1 f5ec9b70 00000010 
00000000
       f7d43020 c01147b5 00000000 00000000 fbffc000 00000000 f7d8c000 
00000000
       f7d43020 c01147b5 00100100 00200200 00000000 00000000 00000000 
c02690cb
Call Trace:
 [<c0217f7d>] blk_run_queue+0x24/0x47
 [<c02691f1>] blkio_schedule+0x126/0x14c
 [<c01147b5>] default_wake_function+0x0/0x12
 [<c01147b5>] default_wake_function+0x0/0x12
 [<c02690cb>] blkio_schedule+0x0/0x14c
 [<c01071f1>] kernel_thread_helper+0x5/0xb
Code:  Bad EIP value.
>
>   Thanks,
>   Keir

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Ian Pratt

2005-Mar-31 21:32 UTC

head link

RE: [Xen-devel] poor domU VBD performance.

> Could you try again, but replace calls to 
> blk_run_queue(plugged_queue) in blkback.c with:
>     if ( plugged_queue->unplug_fn )
>            plugged_queue->unplug_fn(plugged_queue);
> 
> This looks like a better match with what various other 
> drivers do (e.g. 
> swraid).
This patch is required to make it work with LVM. 2.0-testing and
unstable will be updated shortly...

Ian 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Nivedita Singhvi

2005-Mar-31 21:59 UTC

head link

Re: [Xen-devel] poor domU VBD performance.

Kurt Garloff wrote:
> Hi Niv,
> 
> On Thu, Mar 31, 2005 at 08:27:30AM -0800, Nivedita Singhvi wrote:
> 
>>Although the usual answer for what scheduling algorithm is
>>best is almost always "depends on the workload", it was
>>suggested to me that the cfq was still the best option to
>>go with. What do people feel about that? (Or is AS going
>>to remain default?).
> 
> 
> This is a different dicussion.
Yes, I did change the subject a little ;).
> But, yes, I would agree that CFQ (v3) is the best default choice.
Yep, even though some of the complications in the Xen
environment (as you point out below) will have to be addressed.
> Jens, should we maybe make sure that the blockback driver does use 
> different (fake) UIDs for the domains that it serves to provide 
> the fairness between them. Next step would be to allow to tweak 
> IO priorities. Or, to make it more general, add a parameter (call
> it uid), that a block driver can pass down to the IO scheduler
> and that would normally be current->uid but may be set differently?
> It''s part of 2.6.11.
> garloff@tpkurt:~ [0]$ cat /sys/block/hda/queue/scheduler
> noop anticipatory deadline [cfq]
I just saw Jens'' reply as well. This is much goodness :).
Very handy indeed!

thanks,
Nivedita



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Andrew Theurer

2005-Mar-31 22:13 UTC

head link

Re: [Xen-devel] poor domU VBD performance.

On Thursday 31 March 2005 15:32, Ian Pratt wrote:>  > Could you try again, but replace calls to
> >
> > blk_run_queue(plugged_queue) in blkback.c with:
> >     if ( plugged_queue->unplug_fn )
> >            plugged_queue->unplug_fn(plugged_queue);
> >
> > This looks like a better match with what various other
> > drivers do (e.g.
> > swraid).
OK, changes worked for me, but still have some min latency here (but 
much better)

      reqsze    MB/sec    svcmt

xenU    16k     6266.67   1.25
        32k    12618.67   1.20 
        64k    25002.67   1.28
       128k    49322.67   1.35
       256k    58538.67   3.15

xen0    16k    13818.67   1.15
        32k    27573.33   1.16
        64k    54784.00   1.16
       128k    58581.33   2.18
       256k    58453.33   4.38

noXen   16k    58679.19   0.27
	32k    58453.33   0.54
	64k    58713.04   1.08
       128k    58174.09   2.17
       256k    58820.07   4.36




	






_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Ian Pratt

2005-Mar-31 22:36 UTC

head link

RE: [Xen-devel] poor domU VBD performance.

>       reqsze    MB/sec    svcmt
> 
> xenU    16k     6266.67   1.25
>         32k    12618.67   1.20 
>         64k    25002.67   1.28
>        128k    49322.67   1.35
>        256k    58538.67   3.15
> 
> xen0    16k    13818.67   1.15
>         32k    27573.33   1.16
>         64k    54784.00   1.16
>        128k    58581.33   2.18
>        256k    58453.33   4.38
> 
> noXen   16k    58679.19   0.27
> 	32k    58453.33   0.54
> 	64k    58713.04   1.08
>        128k    58174.09   2.17
>        256k    58820.07   4.36
These figures for xen0 are interesting. It''s odd that we tail off so
badly for short requests. What interrupt rates are occuring when you do
these tests?

Thanks,
ian

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Andrew Theurer

2005-Mar-31 23:05 UTC

head link

Re: [Xen-devel] poor domU VBD performance.

On Thursday 31 March 2005 16:36, Ian Pratt wrote:> >       reqsze    MB/sec    svcmt
> >
> > xenU    16k     6266.67   1.25
> >         32k    12618.67   1.20
> >         64k    25002.67   1.28
> >        128k    49322.67   1.35
> >        256k    58538.67   3.15
> >
> > xen0    16k    13818.67   1.15
> >         32k    27573.33   1.16
> >         64k    54784.00   1.16
> >        128k    58581.33   2.18
> >        256k    58453.33   4.38
> >
> > noXen   16k    58679.19   0.27
> > 	32k    58453.33   0.54
> > 	64k    58713.04   1.08
> >        128k    58174.09   2.17
> >        256k    58820.07   4.36
>
> These figures for xen0 are interesting. It''s odd that we tail off
so
> badly for short requests. What interrupt rates are occuring when you
> do these tests?
I just ran again, and for some reason it looks fine now...  I have no 
idea what I did to get the lower numbers initially, perhaps an 
inadvertant IO scheduler change.  Service commit times are .28ms and I 
can drive ~58MB/sec with just 16k requests on xen0.  I''ll do some more 
tests to get a more consistent picture.

-Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jens Axboe

2005-Apr-01 05:43 UTC

head link

Re: [Xen-devel] poor domU VBD performance.

On Thu, Mar 31 2005, Keir Fraser wrote:> 
> On 31 Mar 2005, at 21:49, Andrew Theurer wrote:
> 
> >BTW, I am now getting this with xen-unstable:
> >
> >Process xenblkd (pid: 730, threadinfo=f7cc4000 task=f7c42510)
> >Stack: c022d172 f44b1a08 f363c6f0 f7cc4000 c046d40c c02849f8 f44b1a08 
> >00000010
> >       00000000 f7c42510 c0115b0a 00000000 00000000 f7c42510 c17f1e48 
> >c01092e6
> >       00000000 f7c42510 c0115b0a 00100100 00200200 00000000 00000000 
> >00000000
> 
> I wonder if blk_run_queue() is not the right thing to call. For 
> example, it ignores whether the queue has been forcibly stopped by the 
> underlying driver and doesn''t check whether there are any requests
that
> actually require pushing. Plus various drivers (swraid and probably 
> lvm) have their own unplug function and blk_run_queue doesn''t
handle
> that.
> 
> Could you try again, but replace calls to blk_run_queue(plugged_queue) 
> in blkback.c with:
>    if ( plugged_queue->unplug_fn )
>           plugged_queue->unplug_fn(plugged_queue);
> 
> This looks like a better match with what various other drivers do (e.g. 
> swraid).
Yes you are right, you really want to just unplug it. That should work
correctly in all cases. Remember that ->unplug_fn must not be called
with any locks called.

-- 
Jens Axboe


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

peter bier

2005-Apr-01 16:36 UTC

head link

[Xen-devel] Re: poor domU VBD performance.

Ian Pratt <m+Ian.Pratt <at> cl.cam.ac.uk> writes:
> 
> > > I''ve checked in something along the lines of what you 
> > described into 
> > > both the 2.0-testing and the unstable trees. Looks to have 
> > identical 
> > > performance to the original simple patch, at least for a bulk
''dd''.
> > 
> > Can you post the patch here for review? Or just point me 
> > somewhere I can view it.
> 
> Jens,
> 
> Thanks for your help on this.
> 
> Here''s Keirs updated patch:
> http://xen.bkbits.net:8080/xen-2.0-testing.bk/gnupatch <at>
424c1abd7LgWMiask
> LEEAAX7ffdkXQ
> 
> Which is based on this earlier patch from you:
> http://xen.bkbits.net:8080/xen-2.0-testing.bk/gnupatch <at>
424bba4091aV1FuNk
> sY_4w_z4Tvr3g
> 
> Best,
> Ian
> I have applied the patch in blkback.c for xen0 and have gotten good results 
now.
I have tested two systems one with a standard ide disk device and another with
two SATA disks. I stumbled over this issue when I was doing filesystem io and
wanted to check the efficiency of xen-linux. It was then that I went to raw IO
on block devices and found that it didn''t perform as I hoped. 

Now I have switched back to the filesystem operations. I do this by copying a
"/usr" subtree from a slackware-10.0 installation containg about 750
MB in
2200 directories and 37000 files. Copying these  files with target directory on
the same device as the source directory, I get between 90 and 93% of the per-
formance in Dom0, when I work with DomU. When copying form a directory on one
device into a directory of another device, performance in DomU leaks more 
behind
that of Dom0. It''s only 50 to 60 percent of the Dom0 performance. The 
performance is  less than it is when using only one disk. I found out
that the sum of the business of the two disks as reported by iostat on Dom0 is
always slightly above 100%.  Does this reflect that the reading and the
writing both  go through the VDB driver ? Both devices are never 100 % busy.

Any explanations ? 

Thanks in advance 

   Peter

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Ian Pratt

2005-Apr-01 17:46 UTC

head link

RE: [Xen-devel] Re: poor domU VBD performance.

> Now I have switched back to the filesystem operations. I do 
> this by copying a "/usr" subtree from a slackware-10.0 
> installation containg about 750 MB in 2200 directories and 
> 37000 files. Copying these  files with target directory on 
> the same device as the source directory, I get between 90 and 
> 93% of the per- formance in Dom0, when I work with DomU. When 
> copying form a directory on one device into a directory of 
> another device, performance in DomU leaks more behind that of 
> Dom0. It''s only 50 to 60 percent of the Dom0 performance. The 
> performance is  less than it is when using only one disk. I 
> found out that the sum of the business of the two disks as 
> reported by iostat on Dom0 is always slightly above 100%.  
> Does this reflect that the reading and the writing both  go 
> through the VDB driver ? Both devices are never 100 % busy.
That latest 2.0-testing tree has some further blk queue plugging
enhancements along with a fix for another nasty performance bug. It
would be interesting to know whether that improves things.

It''s possible that the blkring currently just isn''t big enough
if you''re
trying to drive multiple devices with independent requests. 

Ian

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Cédric Schieli

2005-Apr-01 21:40 UTC

head link

Re: [Xen-devel] poor domU VBD performance.

> I just ran again, and for some reason it looks fine now...  I have no 
> idea what I did to get the lower numbers initially, perhaps an 
> inadvertant IO scheduler change.  Service commit times are .28ms and I 
> can drive ~58MB/sec with just 16k requests on xen0.  I''ll do some
more
> tests to get a more consistent picture.
> 
I still experience bad performance in domU with latest xen-testing dom0.

Here''s my setup :

Xen : 2.0.5
Dom0 : 2.6.11-xen-testing (20050401 ~22h CEST) running Debian Sarge
DomU : 2.6.10-xen-2.0.5 (8G LVM backed VBDs exported as hda1) running
Gentoo
Processor : AthlonXP 1800+
Chipset : VIA KT600
Drive : Seagate ST380013AS 80G SATA

And my results :

Dom0 : 51 MB/s
DomU : 36 MB/s

I''ve tried with request sizes from 128k to 1024k reading entire volume
and obtained always same results.
Changing the scheduler on Dom0 and/or DomU doesn''t change anything.

I can give you more info if nedded.

--
Cédric Schieli

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Ian Pratt

2005-Apr-01 23:22 UTC

head link

RE: [Xen-devel] poor domU VBD performance.

There have been some changes to the frontend driver too: you might want to try
using the 2.0-testing kernel in domU too.

Also, a really nasty CPU performance bug got fixed earlier this evening, so you
should make sure you have the latest tree.

Ian
> > I just ran again, and for some reason it looks fine now...  
> I have no 
> > idea what I did to get the lower numbers initially, perhaps an 
> > inadvertant IO scheduler change.  Service commit times are 
> .28ms and I 
> > can drive ~58MB/sec with just 16k requests on xen0.  I''ll 
> do some more 
> > tests to get a more consistent picture.
> > 
> 
> I still experience bad performance in domU with latest 
> xen-testing dom0.
> 
> Here''s my setup :
> 
> Xen : 2.0.5
> Dom0 : 2.6.11-xen-testing (20050401 ~22h CEST) running Debian 
> Sarge DomU : 2.6.10-xen-2.0.5 (8G LVM backed VBDs exported as 
> hda1) running Gentoo Processor : AthlonXP 1800+ Chipset : VIA 
> KT600 Drive : Seagate ST380013AS 80G SATA
> 
> And my results :
> 
> Dom0 : 51 MB/s
> DomU : 36 MB/s
> 
> I''ve tried with request sizes from 128k to 1024k reading 
> entire volume and obtained always same results.
> Changing the scheduler on Dom0 and/or DomU doesn''t change
anything.
> 
> I can give you more info if nedded.
> 
> --
> Cédric Schieli
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
> 
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Cédric Schieli

2005-Apr-02 10:36 UTC

head link

RE: [Xen-devel] poor domU VBD performance.

I''ve just tried with latest testing tree DomO and DomU and got same
results.

Le samedi 02 avril 2005 à 00:22 +0100, Ian Pratt a écrit
:>  There have been some changes to the frontend driver too: you might want to
try using the 2.0-testing kernel in domU too.
> 
> Also, a really nasty CPU performance bug got fixed earlier this evening, so
you should make sure you have the latest tree.
> 
> Ian
> 
> > > I just ran again, and for some reason it looks fine now...  
> > I have no 
> > > idea what I did to get the lower numbers initially, perhaps an 
> > > inadvertant IO scheduler change.  Service commit times are 
> > .28ms and I 
> > > can drive ~58MB/sec with just 16k requests on xen0. 
I''ll
> > do some more 
> > > tests to get a more consistent picture.
> > > 
> > 
> > I still experience bad performance in domU with latest 
> > xen-testing dom0.
> > 
> > Here''s my setup :
> > 
> > Xen : 2.0.5
> > Dom0 : 2.6.11-xen-testing (20050401 ~22h CEST) running Debian 
> > Sarge DomU : 2.6.10-xen-2.0.5 (8G LVM backed VBDs exported as 
> > hda1) running Gentoo Processor : AthlonXP 1800+ Chipset : VIA 
> > KT600 Drive : Seagate ST380013AS 80G SATA
> > 
> > And my results :
> > 
> > Dom0 : 51 MB/s
> > DomU : 36 MB/s
> > 
> > I''ve tried with request sizes from 128k to 1024k reading 
> > entire volume and obtained always same results.
> > Changing the scheduler on Dom0 and/or DomU doesn''t change
anything.
> > 
> > I can give you more info if nedded.
> > 
> > --
> > Cédric Schieli
> > 
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@lists.xensource.com
> > http://lists.xensource.com/xen-devel
> > 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Ian Pratt

2005-Apr-02 10:56 UTC

head link

RE: [Xen-devel] poor domU VBD performance.

> > > Xen : 2.0.5
> > > Dom0 : 2.6.11-xen-testing (20050401 ~22h CEST) running 
> Debian Sarge 
> > > DomU : 2.6.10-xen-2.0.5 (8G LVM backed VBDs exported as
> > > hda1) running Gentoo Processor : AthlonXP 1800+ Chipset : 
> VIA KT600 
> > > Drive : Seagate ST380013AS 80G SATA
> > > 
> > > And my results :
> > > 
> > > Dom0 : 51 MB/s
> > > DomU : 36 MB/s
> > > 
> > > I''ve tried with request sizes from 128k to 1024k reading
entire
> > > volume and obtained always same results.
> > > Changing the scheduler on Dom0 and/or DomU doesn''t
change
> anything.
Are you sure you''re reading from the exact same part of the disk in
both instances?
How are you doing the bandwidth measurements? ''dd''?

Ian

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Cédric Schieli

2005-Apr-02 12:10 UTC

head link

RE: [Xen-devel] poor domU VBD performance.

> Are you sure you''re reading from the exact same part of the disk
in both instances?
> How are you doing the bandwidth measurements? ''dd''?
I have this line in my DomU conf :
disk = [ ''phy:vg/gentoo-root,hda1,w''
,''phy:vg/gentoo-swap,hda2,w'' ]

I make my measurements with :
Dom0 : dd if=/dev/vg/gentoo-root of=/dev/null bs={128|256|...}k
DomU : dd if=/dev/hda1 of=/dev/null bs={128|256|...}k

In all cases I get same results : 50-52 MB/s on Dom0, 34-37 MB/s on DomU
I''ve tried with any combination of scheduler.

I will try with latest xen-testing hypervisor (I still use 2.0.5 for the
moment) but I don''t think this should impact a lot.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

peter bier

2005-Apr-02 19:54 UTC

head link

[Xen-devel] Re: poor domU VBD performance.

Cédric Schieli <cedric <at> schieli.dyndns.org> writes:
> 
> I''ve just tried with latest testing tree DomO and DomU and got
same
> results.
> 
> Le samedi 02 avril 2005 à 00:22 +0100, Ian Pratt a écrit :
> >  There have been some changes to the frontend driver too: you might
want to
try using the 2.0-testing kernel> in domU too.
> > 
> > Also, a really nasty CPU performance bug got fixed earlier this
evening, so
you should make sure you have> the latest tree.
> > 
> > Ian
> > 
> > > > I just ran again, and for some reason it looks fine now...  
> > > I have no 
> > > > idea what I did to get the lower numbers initially, perhaps
an
> > > > inadvertant IO scheduler change.  Service commit times are 
> > > .28ms and I 
> > > > can drive ~58MB/sec with just 16k requests on xen0. 
I''ll
> > > do some more 
> > > > tests to get a more consistent picture.
> > > > 
> > > 
> > > I still experience bad performance in domU with latest 
> > > xen-testing dom0.
> > > 
> > > Here''s my setup :
> > > 
> > > Xen : 2.0.5
> > > Dom0 : 2.6.11-xen-testing (20050401 ~22h CEST) running Debian 
> > > Sarge DomU : 2.6.10-xen-2.0.5 (8G LVM backed VBDs exported as 
> > > hda1) running Gentoo Processor : AthlonXP 1800+ Chipset : VIA 
> > > KT600 Drive : Seagate ST380013AS 80G SATA
> > > 
> > > And my results :
> > > 
> > > Dom0 : 51 MB/s
> > > DomU : 36 MB/s
> > > 
> > > I''ve tried with request sizes from 128k to 1024k reading
> > > entire volume and obtained always same results.
> > > Changing the scheduler on Dom0 and/or DomU doesn''t
change anything.
> > > 
> > > I can give you more info if nedded.
> > > 
> > > --
> > > Cédric Schieli
> > > 
> > > _______________________________________________
> > > Xen-devel mailing list
> > > Xen-devel <at> lists.xensource.com
> > > http://lists.xensource.com/xen-devel
> > > 
> > 
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel <at> lists.xensource.com
> > http://lists.xensource.com/xen-devel
> 

It just sumbled accross the fact, that you are using a SATA disk, Cédric. This
is 
exactly the "dd" behavior that  my system containing SATA disks still
shows. But
it
applies only to "dd" ( which, admittedly is read-only ). It does not
apply to
the performance figures I got when copying my "/usr" tree - as
described in a
previous post here - from one location of the disk to another location on the
same disk ( which, of course is combined read-write on the same device ). Hence
it might be possible that my limited performance copying from one disk to
another might in fact be an effect of reduced read performance in DomU on a 
SATA disk.  

I suspect that this might be an effect specific to SATA disks. I will verify 
this on monday - when I have access to my computers in the office, by doing it
on a system with two IDE disks. I will report it then, if your problem is still
open. 

I will describe the exact configuration of the systems then (Motherboard, IO 
Controller, etc ).

Peter 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Cédric Schieli

2005-Apr-03 15:27 UTC

head link

Re: [Xen-devel] Re: poor domU VBD performance.

I can confim the problem only occur on SATA.
I''ve added an old IDE UDMA66 drive, created LVM volume from it and ran
same dd tests :
Dom0 : 12 MB/s
DomU : 12 MB/s

> It just sumbled accross the fact, that you are using a SATA disk, Cédric.
This
> is 
> exactly the "dd" behavior that  my system containing SATA disks
still shows. But
> it
> applies only to "dd" ( which, admittedly is read-only ). It does
not apply to
> the performance figures I got when copying my "/usr" tree - as
described in a
> previous post here - from one location of the disk to another location on
the
> same disk ( which, of course is combined read-write on the same device ).
Hence
> it might be possible that my limited performance copying from one disk to
> another might in fact be an effect of reduced read performance in DomU on a
> SATA disk.  
> 
> I suspect that this might be an effect specific to SATA disks. I will
verify
> this on monday - when I have access to my computers in the office, by doing
it
> on a system with two IDE disks. I will report it then, if your problem is
still
> open. 
> 
> I will describe the exact configuration of the systems then (Motherboard,
IO
> Controller, etc ).

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Ian Pratt

2005-Apr-03 16:38 UTC

head link

RE: [Xen-devel] Re: poor domU VBD performance.

> I can confim the problem only occur on SATA.
> I''ve added an old IDE UDMA66 drive, created LVM volume from 
> it and ran same dd tests :
> Dom0 : 12 MB/s
> DomU : 12 MB/s
SATA works fine for me on 2.0-testing.
I get 50MB/s reading from a raw partition in both cases using: 
 time dd if=/dev/sda6 of=/dev/null bs=1024k count=1024

Can you try a raw partition rather than LVM?

Thanks,
Ian

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Cédric Schieli

2005-Apr-04 19:13 UTC

head link

RE: [Xen-devel] Re: poor domU VBD performance.

> SATA works fine for me on 2.0-testing.
> I get 50MB/s reading from a raw partition in both cases using: 
>  time dd if=/dev/sda6 of=/dev/null bs=1024k count=1024
I''ve tried with a raw partition (the same that holds the LVM volume)
and
got same results : 51 MB/s on Dom0 and 37 MB/s on DomU

I don''t know if it is of importance, but I need to add
ignorebiostables=1 in my boot parameters in order to make the SATA work
(kernel hang on drive detection without it). The SATA controller is a
VIA one.


Cédric Schieli

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Ian Pratt

2005-Apr-04 19:36 UTC

head link

RE: [Xen-devel] Re: poor domU VBD performance.

> > SATA works fine for me on 2.0-testing.
> > I get 50MB/s reading from a raw partition in both cases using: 
> >  time dd if=/dev/sda6 of=/dev/null bs=1024k count=1024
> 
> I''ve tried with a raw partition (the same that holds the LVM 
> volume) and got same results : 51 MB/s on Dom0 and 37 MB/s on DomU
> 
> I don''t know if it is of importance, but I need to add
> ignorebiostables=1 in my boot parameters in order to make the 
> SATA work (kernel hang on drive detection without it). The 
> SATA controller is a VIA one.
It doesn''t sound like Xen is too happy on your system, but its not
clear how this would explain the performance difference between dom0 and domU.

When the IOAPIC patches are checked in it will be interesting to see whether
this fixes it. Try the unstable tree in a week or so.

Best,
Ian

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Nicholas Lee

2005-Apr-04 22:35 UTC

head link

Re: [Xen-devel] Re: poor domU VBD performance.

Some simple non-scientific additions to the performance numbers. IBM
x335/MPT SCSI.

Previously on 2.0.5/Testing on 2.6.10:

[nic@stateless:~/sys/xen] sudo hdparm -tT /dev/sda

/dev/sda:
 Timing cached reads:   2884 MB in  2.00 seconds = 1442.00 MB/sec
 Timing buffered disk reads:  100 MB in  3.05 seconds =  32.79 MB/sec

Not completely happy with the buffered read figure of 34Mb.  I''m
putting together a new x205 server with Xen later today. I''ll try do
some native vs Xen testing while I''m at it.

dom0:
[nic@stateless:~/tmp] time sudo  cp db-svn.tgz db-svn-bak.tgz

real    0m13.058s
user    0m0.030s
sys     0m0.530s

domU:
[nic@base:/export/bak] time sudo  cp db-svn.tgz db-svn-bak.tgz

real    0m23.574s
user    0m0.010s
sys     0m0.060s


[nic@stateless:~/tmp] ls -l db-svn.tgz
-rw-r--r--  1 nic nic 188247603 2005-04-04 21:06 db-svn.tgz

With todays 2.0.6/Testing on 2.6.11.6:

[nic@stateless:~] sudo hdparm -tT /dev/sda

/dev/sda:
 Timing cached reads:   2748 MB in  2.00 seconds = 1374.00 MB/sec
 Timing buffered disk reads:  102 MB in  3.00 seconds =  34.00 MB/sec

[nic@stateless:~/tmp] time sudo  cp db-svn.tgz db-svn-bak.tgz

real    0m10.468s
user    0m0.010s
sys     0m0.070s


[nic@base:/export/bak] time sudo  cp db-svn.tgz db-svn-bak.tgz

real    0m11.243s
user    0m0.000s
sys     0m0.040s

Both filesystems based on XFS/LVM2. These numbers are based on one-run
right after boot, with in the domU case just one domU running.

So definite improvement.

Nicholas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

peter bier

2005-Apr-12 10:29 UTC

head link

[Xen-devel] Re: poor domU VBD performance.

I am sorry to return to this issue after quite a long interruption. 
As I mentioned in a post before, I came accross this problem when I
was testing file-system performance. After the problems with raw sequential
I/O seemed to have been fixed in the testing release, I turned back to
my original problem. 
I did a simple test that dispite its simplicity seems to put the IO subsystem
under considerable stress. I took the /usr tree of my system and copied 
five it times into different directories on a slice of disk 1. This tree con-
sistst of 36000 files with about 750 MB of data. Then I started 
to copy each of these copies recursively onto disk 2 ( each to its own 
location on that disk, of course ). I ran these copying
in parallel and the processes took about 6 to 7 minutes in DOM0, while they
needed between 14.6 and 15.9 minutes in DOMU. 

Essentially, this means that using this heavy io load on the system I get 
back to my 40% ratio between io performance on DOMU compared and io perfor-
mance on DOM0 that I initially reported. This may just be coincidence, but
probably it is worth mention. 

I monitored the disk and block-io activity with iostat. The output of
both is too large to post it here, so I will only try to include a few 
representative lines of each. The first two lines show the activity while
doing the copying on DOMU.

This is a snapshot of a phase with relatively high throughput (DOMU):


Device:    rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s    rkB/s    wkB/s avgrq-
sz avgqu-sz   await  svctm  %util
hde          0.00 2748.00  1.60 71.20   12.80 22561.60     6.40 11280.80   
310.09     1.78   23.96   4.73  34.40
hdg        2571.00   5.00 126.80  9.60 21580.80  115.20 10790.40    57.60   
159.06     5.48   40.38   6.61  90.20

avg-cpu:  %user   %nice %system %iowait   %idle
           0.20    0.00    6.20    0.20   93.40


this is a snapshot of a phase with relatively low throughput (DOMU):


Device:    rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s    rkB/s    wkB/s avgrq-
sz avgqu-sz   await  svctm  %util
hde          0.00 676.40  0.00 33.00    0.00 5678.40     0.00  2839.20   
172.07     1.76   53.45   4.91  16.20
hdg        335.80  11.00 315.00  3.40 5206.40  115.20  2603.20    57.60    
16.71     4.15   13.02   2.76  87.80

avg-cpu:  %user   %nice %system %iowait   %idle
           0.20    0.00    9.00    0.00   90.80


_I suspect, that the reported iowait on cpu-usage is not entirely correct, but
I am not sure about it.

The next two lines are snapshots of iostat output during the copying in DOM0

again the first snapshot was taken in a phase of relative high throughput
(DOM0):

Device:    rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s    rkB/s    wkB/s avgrq-
sz avgqu-sz   await  svctm  %util
hde          0.00 5845.40  1.40 110.20   11.20 47812.80     5.60 23906.40   
428.53   105.96  772.63   8.96 100.00
hdg         46.20  24.80 389.80  2.20 47628.80  216.00 23814.40   108.00   
122.05     7.12   18.23   3.30 129.40

avg-cpu:  %user   %nice %system %iowait   %idle
           2.40    0.00   40.20   57.40    0.00

the next line was taken in a phase of relatively low throughput (DOM0):

Device:    rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s    rkB/s    wkB/s avgrq-
sz avgqu-sz   await  svctm  %util
hde          0.00 903.40  0.20 106.80    3.20 7972.80     1.60  3986.40    
74.54    20.77  217.91   4.06  43.40
hdg          0.00  24.00 746.60  1.20 9302.40  200.00  4651.20   100.00    
12.71     4.96    6.67   1.34 100.00

avg-cpu:  %user   %nice %system %iowait   %idle
           3.40    0.00   44.00   52.60    0.00

The problem seems to be the reading. The device hde, which contains the slice 
where the data is copied onto is almost never really busy when using DOMU.  
The ratio of kb/s written and usage seems to reflect that writing from DOMU
is just as efficient as writing from DOM0 ( writing can be buffered in 
both cases after all ). 
Yet the information on reading seems to show a different picture. Blockio
merges requests permanently resulting in request sizes that are approxi-
mately equal in both cases. Yet service times for DOMU requests are about
twice the time needed for requests for DOM0.

I do not know if such a scenario is simply inadequate for virtual systems at
least under Xen. We are thinking about running a mail gateway on top of a 
protected and secured dom0 system, and potentially offering other network
services in separate domains. We want to avoid corruption of DOM0 
while being able to offer "insecure" services in nonprivileged
domains.
We know that mail servicing can potentially
put an intense load onto the filesystem - admittedly more on inodes ( create
and delete ) than with respect to data throughput.

Do I simply have to accept that under heavy io load domains using vbd to 
access storage devices will lag behind dom0 and native linux systems, or is
there a chance to fix this ?


My reported test was done on a fujitsu-siemens system RX100 with a 2.0 Ghz
Celeron CPU and a total of only 256 MB of memory. DOM0 had 128 MB and DOMU 
100 MB. The disks were simply ide disks. I did the same test on a System 
with 1.25 GB Ram with both domains having 0.5 GB of memory. It contains SATA
disks and the results are essentially the same the only difference is that both
processes are slower due to less throughput under random access from the disks.

Any advice ore help ?

Thanks in advance 

    Peter 
 


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Ian Pratt

2005-Apr-12 10:51 UTC

head link

RE: [Xen-devel] Re: poor domU VBD performance.

#> I am sorry to return to this issue after quite a long interruption. 
> As I mentioned in a post before, I came accross this problem 
> when I was testing file-system performance. After the 
> problems with raw sequential I/O seemed to have been fixed in 
> the testing release, I turned back to my original problem. 
> I did a simple test that dispite its simplicity seems to put 
> the IO subsystem under considerable stress. I took the /usr 
> tree of my system and copied five it times into different 
> directories on a slice of disk 1. This tree con- sistst of 
> 36000 files with about 750 MB of data. Then I started to copy 
> each of these copies recursively onto disk 2 ( each to its 
> own location on that disk, of course ). I ran these copying 
> in parallel and the processes took about 6 to 7 minutes in 
> DOM0, while they needed between 14.6 and 15.9 minutes in DOMU. 
> 
> Essentially, this means that using this heavy io load on the 
> system I get back to my 40% ratio between io performance on 
> DOMU compared and io perfor- mance on DOM0 that I initially 
> reported. This may just be coincidence, but probably it is 
> worth mention. 
It''s possible that the dom0 doing prefetch as well as the domU is
messing up random IO performance. Do the iostat numbers suggest dom0 is
reading more data overall when doing it on behalf of a domU?

We''ll need a simpler way of reproducing this if any headway is to be
made debugging it.

It might be worth writing a program to do psuedo-random IO reads to a
partition, both in DIRECT and normal mode, then run it in dom0 and domU.

[Chris: you have such a program already, right? Can you post it, thanks]

Ian

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Xen devel - Mar 2005 - poor domU VBD performance.

[Xen-devel] poor domU VBD performance.

RE: [Xen-devel] poor domU VBD performance.

[Xen-devel] Re: poor domU VBD performance.

[Xen-devel] Re: poor domU VBD performance.

RE: [Xen-devel] poor domU VBD performance.

Re: [Xen-devel] poor domU VBD performance.

RE: [Xen-devel] poor domU VBD performance.

Re: [Xen-devel] poor domU VBD performance.

Re: [Xen-devel] poor domU VBD performance.

RE: [Xen-devel] poor domU VBD performance.

[Xen-devel] Re: poor domU VBD performance.

Re: [Xen-devel] Re: poor domU VBD performance.

Re: [Xen-devel] poor domU VBD performance.

RE: [Xen-devel] Re: poor domU VBD performance.

[Xen-devel] Re: poor domU VBD performance.

[Xen-devel] Re: poor domU VBD performance.

Re: RE: [Xen-devel] poor domU VBD performance.

RE: [Xen-devel] Re: poor domU VBD performance.

RE: [Xen-devel] Re: poor domU VBD performance.

RE: [Xen-devel] Re: poor domU VBD performance.

[Xen-devel] Re: poor domU VBD performance.

Re: [Xen-devel] Re: poor domU VBD performance.

Re: [Xen-devel] Re: poor domU VBD performance.

Re: RE: RE: [Xen-devel] poor domU VBD performance.

Re: [Xen-devel] poor domU VBD performance.

Re: [Xen-devel] poor domU VBD performance.

Re: [Xen-devel] poor domU VBD performance.

Re: RE: RE: [Xen-devel] poor domU VBD performance.

Re: RE: RE: [Xen-devel] poor domU VBD performance.

RE: RE: RE: [Xen-devel] poor domU VBD performance.

[Xen-devel] Re: poor domU VBD performance.

Re: [Xen-devel] Re: poor domU VBD performance.

Re: RE: RE: [Xen-devel] poor domU VBD performance.

Re: RE: RE: [Xen-devel] poor domU VBD performance.

Re: [Xen-devel] poor domU VBD performance.

Re: [Xen-devel] poor domU VBD performance.

Re: [Xen-devel] poor domU VBD performance.

Re: [Xen-devel] poor domU VBD performance.

Re: [Xen-devel] poor domU VBD performance.

Re: [Xen-devel] poor domU VBD performance.

Re: [Xen-devel] poor domU VBD performance.

Re: [Xen-devel] poor domU VBD performance.

Re: [Xen-devel] poor domU VBD performance.

Re: [Xen-devel] poor domU VBD performance.

Re: [Xen-devel] poor domU VBD performance.

Re: [Xen-devel] poor domU VBD performance.

Re: [Xen-devel] poor domU VBD performance.

RE: [Xen-devel] poor domU VBD performance.

Re: [Xen-devel] poor domU VBD performance.

Re: [Xen-devel] poor domU VBD performance.

Re: [Xen-devel] poor domU VBD performance.

Re: [Xen-devel] poor domU VBD performance.

Re: [Xen-devel] poor domU VBD performance.

Re: [Xen-devel] poor domU VBD performance.

Re: [Xen-devel] poor domU VBD performance.

Re: [Xen-devel] poor domU VBD performance.

Re: [Xen-devel] poor domU VBD performance.

Re: [Xen-devel] poor domU VBD performance.

Re: [Xen-devel] poor domU VBD performance.

Re: [Xen-devel] poor domU VBD performance.

Re: [Xen-devel] poor domU VBD performance.

RE: [Xen-devel] poor domU VBD performance.

Re: [Xen-devel] poor domU VBD performance.

Re: [Xen-devel] poor domU VBD performance.

RE: [Xen-devel] poor domU VBD performance.

Re: [Xen-devel] poor domU VBD performance.

Re: [Xen-devel] poor domU VBD performance.

[Xen-devel] Re: poor domU VBD performance.

RE: [Xen-devel] Re: poor domU VBD performance.

Re: [Xen-devel] poor domU VBD performance.

RE: [Xen-devel] poor domU VBD performance.

RE: [Xen-devel] poor domU VBD performance.

RE: [Xen-devel] poor domU VBD performance.

RE: [Xen-devel] poor domU VBD performance.

[Xen-devel] Re: poor domU VBD performance.

Re: [Xen-devel] Re: poor domU VBD performance.

RE: [Xen-devel] Re: poor domU VBD performance.

RE: [Xen-devel] Re: poor domU VBD performance.

RE: [Xen-devel] Re: poor domU VBD performance.