thr3ads.net - Xen devel - Load increase after memory upgrade (part2) [Nov 2011]

If this information is useful, please help other people find it:
Share via:

Carsten Schiers

2011-Nov-24 12:28 UTC

Load increase after memory upgrade (part2)

Hello again, I would like to come back to that thing...sorry that I did not have
the time up to now.

 
We (now) speak about

 
*	Xen 4.1.2
*	Dom0 is Jeremy''s 2.6.32.46 64 bit
*	DomU in question is now 3.1.2 64 bit
*	Same thing if DomU is also 2.6.32.46
*	DomU owns two PCI cards (DVB-C) that o DMA
*	Machine has 8GB, Dom0 pinned at 512MB

 
As compared to 2.6.34 Kernel with backported patches, the load on the DomU is at
least twice as high. It

will be "close to normal" if I reduce the memory used to 4GB.

 
As you can see from the attachment, you once had an idea. So should we try to
find something...?

 
Carsten.
 
-----Ursprüngliche Nachricht-----
An:konrad.wilk <konrad.wilk@oracle.com>; 
CC:linux <linux@eikelenboom.it>; xen-devel
<xen-devel@lists.xensource.com>;
Von:Carsten Schiers <carsten@schiers.de>
Gesendet:Mi 29.06.2011 23:17
Betreff:AW: Re: Re: Re: AW: Re: [Xen-devel] AW: Load increase after memory
upgrade?> Lets first do the c) experiment as that will likely explain your load
average increase.
...> >c). If you want to see if the fault here lies in the bounce buffer 
> being used more
> >often in the DomU b/c you have 8GB of memory now and you end up using 
> more pages
> >past 4GB (in DomU), I can cook up a patch to figure this out. But an 
> easier way is
> >to just do (on the Xen hypervisor line): mem=4G and that will make 
> think you only have
> >4GB of physical RAM.  If the load comes back to the normal
"amount"
> then the likely
> >culprit is that and we can think on how to fix this.
You are on the right track. Load was going down to "normal" 10% when
reducing
Xen to 4GB by the parameter. Load seems to be still a little, little bit lower
with Xenified Kernel (8-9%), but this is drastically lower than the 20% we had
before.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Konrad Rzeszutek Wilk

2011-Nov-25 18:42 UTC

head link

Re: Load increase after memory upgrade (part2)

On Thu, Nov 24, 2011 at 01:28:44PM +0100, Carsten Schiers
wrote:> Hello again, I would like to come back to that thing...sorry that I did not
have the time up to now.
> 
> ??
> We (now) speak about
> 
> ??
> *	Xen 4.1.2
> *	Dom0 is Jeremy''s 2.6.32.46 64 bit
> *	DomU in question is now 3.1.2 64 bit
> *	Same thing if DomU is also 2.6.32.46
> *	DomU owns two PCI cards (DVB-C) that o DMA
> *	Machine has 8GB, Dom0 pinned at 512MB
> 
> ??
> As compared to 2.6.34 Kernel with backported patches, the load on the DomU
is at least twice as high. It
> 
> will be "close to normal" if I reduce the memory used to 4GB.
That is in the dom0 or just in general on the machine?> 
> ??
> As you can see from the attachment, you once had an idea. So should we try
to find something...?
I think that was to instrument swiotlb to give an idea of how
often it is called and basically have a matrix of its load. And
from there figure out if the issue is that:

 1). The drivers allocoate/bounce/deallocate buffers on every interrupt
    (bad, driver should be using some form of dma pool and most of the
    ivtv do that)

 2). The buffers allocated to the drivers are above the 4GB and we end
    up bouncing it needlessly. That can happen if the dom0 has most of
    the precious memory under 4GB. However, that is usually not the case
    as the domain isusually allocated from the top of the memory. The
    fix for that was to set dom0_mem=max:XX. .. but with Dom0 kernels
    before 3.1, the parameter would be ignored, so you had to use
    ''mem=XX'' on the Linux command line as well.

 3). Where did you get the load values? Was it dom0? or domU?


> 
> ??
> Carsten.
> ??
> -----Urspr??ngliche Nachricht-----
> An:konrad.wilk <konrad.wilk@oracle.com>; 
> CC:linux <linux@eikelenboom.it>; xen-devel
<xen-devel@lists.xensource.com>;
> Von:Carsten Schiers <carsten@schiers.de>
> Gesendet:Mi 29.06.2011 23:17
> Betreff:AW: Re: Re: Re: AW: Re: [Xen-devel] AW: Load increase after memory
upgrade?
> > Lets first do the c) experiment as that will likely explain your load
average increase.
> ...
> > >c). If you want to see if the fault here lies in the bounce buffer
> > being used more
> > >often in the DomU b/c you have 8GB of memory now and you end up
using
> > more pages
> > >past 4GB (in DomU), I can cook up a patch to figure this out. But
an
> > easier way is
> > >to just do (on the Xen hypervisor line): mem=4G and that will make
> > think you only have
> > >4GB of physical RAM. ??If the load comes back to the normal
"amount"
> > then the likely
> > >culprit is that and we can think on how to fix this.
> 
> You are on the right track. Load was going down to "normal" 10%
when reducing
> Xen to 4GB by the parameter. Load seems to be still a little, little bit
lower
> with Xenified Kernel (8-9%), but this is drastically lower than the 20% we
had
> before.
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

Carsten Schiers

2011-Nov-25 22:11 UTC

head link

Re: Load increase after memory upgrade (part2)

I got the values in DomU. I will have

  - aprox. 5% load in DomU with 2.6.34 Xenified Kernel
  - aprox. 15% load in DomU with 2.6.32.46 Jeremy or 3.1.2 Kernel with one card
attached
  - aprox. 30% load in DomU with 2.6.32.46 Jeremy or 3.1.2 Kernel with two cards
attached

I looked through my old mails from you and you explained already the necessity
of double
bounce buffering (PCI->below 4GB->above 4GB). What I don''t
understand is: why does the
Xenified kernel not have this kind of issue?

The driver in question is nearly identical between the two kernel versions. It
is in
Drivers/media/dvb/ttpci by the way and when I understood the code right, the
allo in
question is:

        /* allocate and init buffers */
        av7110->debi_virt = pci_alloc_consistent(pdev, 8192,
&av7110->debi_bus);
        if (!av7110->debi_virt)
                goto err_saa71466_vfree_4;

isn''t it? I think the cards are constantly transferring the stream
received through DMA.

I have set dom0_mem=512M by the way, shall I change that in some way?

I can try out some things, if you want me to. But I have no idea what to do and
where to
start, so I rely on your help...

Carsten.

-----Ursprüngliche Nachricht-----
Von: xen-devel-bounces@lists.xensource.com
[mailto:xen-devel-bounces@lists.xensource.com] Im Auftrag von Konrad Rzeszutek
Wilk
Gesendet: Freitag, 25. November 2011 19:43
An: Carsten Schiers
Cc: xen-devel; konrad.wilk
Betreff: Re: [Xen-devel] Load increase after memory upgrade (part2)

On Thu, Nov 24, 2011 at 01:28:44PM +0100, Carsten Schiers
wrote:> Hello again, I would like to come back to that thing...sorry that I did not
have the time up to now.
> 
> ??
> We (now) speak about
> 
> ??
> *	Xen 4.1.2
> *	Dom0 is Jeremy''s 2.6.32.46 64 bit
> *	DomU in question is now 3.1.2 64 bit
> *	Same thing if DomU is also 2.6.32.46
> *	DomU owns two PCI cards (DVB-C) that o DMA
> *	Machine has 8GB, Dom0 pinned at 512MB
> 
> ??
> As compared to 2.6.34 Kernel with backported patches, the load on the DomU
is at least twice as high. It
> 
> will be "close to normal" if I reduce the memory used to 4GB.
That is in the dom0 or just in general on the machine?> 
> ??
> As you can see from the attachment, you once had an idea. So should we try
to find something...?
I think that was to instrument swiotlb to give an idea of how
often it is called and basically have a matrix of its load. And
from there figure out if the issue is that:

 1). The drivers allocoate/bounce/deallocate buffers on every interrupt
    (bad, driver should be using some form of dma pool and most of the
    ivtv do that)

 2). The buffers allocated to the drivers are above the 4GB and we end
    up bouncing it needlessly. That can happen if the dom0 has most of
    the precious memory under 4GB. However, that is usually not the case
    as the domain isusually allocated from the top of the memory. The
    fix for that was to set dom0_mem=max:XX. .. but with Dom0 kernels
    before 3.1, the parameter would be ignored, so you had to use
    ''mem=XX'' on the Linux command line as well.

 3). Where did you get the load values? Was it dom0? or domU?


> 
> ??
> Carsten.
> ??
> -----Urspr??ngliche Nachricht-----
> An:konrad.wilk <konrad.wilk@oracle.com>; 
> CC:linux <linux@eikelenboom.it>; xen-devel
<xen-devel@lists.xensource.com>;
> Von:Carsten Schiers <carsten@schiers.de>
> Gesendet:Mi 29.06.2011 23:17
> Betreff:AW: Re: Re: Re: AW: Re: [Xen-devel] AW: Load increase after memory
upgrade?
> > Lets first do the c) experiment as that will likely explain your load
average increase.
> ...
> > >c). If you want to see if the fault here lies in the bounce buffer
> > being used more
> > >often in the DomU b/c you have 8GB of memory now and you end up
using
> > more pages
> > >past 4GB (in DomU), I can cook up a patch to figure this out. But
an
> > easier way is
> > >to just do (on the Xen hypervisor line): mem=4G and that will make
> > think you only have
> > >4GB of physical RAM. ??If the load comes back to the normal
"amount"
> > then the likely
> > >culprit is that and we can think on how to fix this.
> 
> You are on the right track. Load was going down to "normal" 10%
when reducing
> Xen to 4GB by the parameter. Load seems to be still a little, little bit
lower
> with Xenified Kernel (8-9%), but this is drastically lower than the 20% we
had
> before.
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Carsten Schiers

2011-Nov-26 09:14 UTC

head link

Re: Load increase after memory upgrade (part2)

To add (read from some munin statistics I made over the time):

  - with load I mean the %CPU of xentop
  - there is no change in CPU usage of the DomU or Dom0
  - xenpm shows the core dedicated to that DomU is doing more work

Also I need to say that reduction to 4GB was performed by Xen parameter.

Carsten.


-----Ursprüngliche Nachricht-----
Von: Konrad Rzeszutek Wilk [mailto:konrad@darnok.org] 
Gesendet: Freitag, 25. November 2011 19:43
An: Carsten Schiers
Cc: konrad.wilk; xen-devel
Betreff: Re: [Xen-devel] Load increase after memory upgrade (part2)

On Thu, Nov 24, 2011 at 01:28:44PM +0100, Carsten Schiers
wrote:> Hello again, I would like to come back to that thing...sorry that I did not
have the time up to now.
> 
> ??
> We (now) speak about
> 
> ??
> *	Xen 4.1.2
> *	Dom0 is Jeremy''s 2.6.32.46 64 bit
> *	DomU in question is now 3.1.2 64 bit
> *	Same thing if DomU is also 2.6.32.46
> *	DomU owns two PCI cards (DVB-C) that o DMA
> *	Machine has 8GB, Dom0 pinned at 512MB
> 
> ??
> As compared to 2.6.34 Kernel with backported patches, the load on the DomU
is at least twice as high. It
> 
> will be "close to normal" if I reduce the memory used to 4GB.
That is in the dom0 or just in general on the machine?> 
> ??
> As you can see from the attachment, you once had an idea. So should we try
to find something...?
I think that was to instrument swiotlb to give an idea of how
often it is called and basically have a matrix of its load. And
from there figure out if the issue is that:

 1). The drivers allocoate/bounce/deallocate buffers on every interrupt
    (bad, driver should be using some form of dma pool and most of the
    ivtv do that)

 2). The buffers allocated to the drivers are above the 4GB and we end
    up bouncing it needlessly. That can happen if the dom0 has most of
    the precious memory under 4GB. However, that is usually not the case
    as the domain isusually allocated from the top of the memory. The
    fix for that was to set dom0_mem=max:XX. .. but with Dom0 kernels
    before 3.1, the parameter would be ignored, so you had to use
    ''mem=XX'' on the Linux command line as well.

 3). Where did you get the load values? Was it dom0? or domU?


> 
> ??
> Carsten.
> ??
> -----Urspr??ngliche Nachricht-----
> An:konrad.wilk <konrad.wilk@oracle.com>; 
> CC:linux <linux@eikelenboom.it>; xen-devel
<xen-devel@lists.xensource.com>;
> Von:Carsten Schiers <carsten@schiers.de>
> Gesendet:Mi 29.06.2011 23:17
> Betreff:AW: Re: Re: Re: AW: Re: [Xen-devel] AW: Load increase after memory
upgrade?
> > Lets first do the c) experiment as that will likely explain your load
average increase.
> ...
> > >c). If you want to see if the fault here lies in the bounce buffer
> > being used more
> > >often in the DomU b/c you have 8GB of memory now and you end up
using
> > more pages
> > >past 4GB (in DomU), I can cook up a patch to figure this out. But
an
> > easier way is
> > >to just do (on the Xen hypervisor line): mem=4G and that will make
> > think you only have
> > >4GB of physical RAM. ??If the load comes back to the normal
"amount"
> > then the likely
> > >culprit is that and we can think on how to fix this.
> 
> You are on the right track. Load was going down to "normal" 10%
when reducing
> Xen to 4GB by the parameter. Load seems to be still a little, little bit
lower
> with Xenified Kernel (8-9%), but this is drastically lower than the 20% we
had
> before.
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

Konrad Rzeszutek Wilk

2011-Nov-28 15:28 UTC

head link

Re: Load increase after memory upgrade (part2)

On Fri, Nov 25, 2011 at 11:11:55PM +0100, Carsten Schiers
wrote:> I got the values in DomU. I will have
> 
>   - aprox. 5% load in DomU with 2.6.34 Xenified Kernel
>   - aprox. 15% load in DomU with 2.6.32.46 Jeremy or 3.1.2 Kernel with one
card attached
>   - aprox. 30% load in DomU with 2.6.32.46 Jeremy or 3.1.2 Kernel with two
cards attached
HA!

I just wonder if the issue is that the reporting of CPU spent is wrong.
Laszlo Ersek and Zhenzhong Duan have both reported a bug in the pvops
code when it came to account of CPU time.
> 
> I looked through my old mails from you and you explained already the
necessity of double
> bounce buffering (PCI->below 4GB->above 4GB). What I don''t
understand is: why does the
> Xenified kernel not have this kind of issue?
That is a puzzle. It should not. The code is very much the same - both
use the generic SWIOTLB which has not changed for years.> 
> The driver in question is nearly identical between the two kernel versions.
It is in
> Drivers/media/dvb/ttpci by the way and when I understood the code right,
the allo in
> question is:
> 
>         /* allocate and init buffers */
>         av7110->debi_virt = pci_alloc_consistent(pdev, 8192,
&av7110->debi_bus);
Good. So it allocates it during init and uses it.>         if (!av7110->debi_virt)
>                 goto err_saa71466_vfree_4;
> 
> isn''t it? I think the cards are constantly transferring the stream
received through DMA.
Yeah, and that memory is set aside for the life of the driver. So there
should be no bounce buffering happening (as it allocated the memory
below the 4GB mark).> 
> I have set dom0_mem=512M by the way, shall I change that in some way?
Does the reporting (CPU usage of DomU) change in any way with
that?> 
> I can try out some things, if you want me to. But I have no idea what to do
and where to
> start, so I rely on your help...
> 
> Carsten.
> 
> -----Urspr?ngliche Nachricht-----
> Von: xen-devel-bounces@lists.xensource.com
[mailto:xen-devel-bounces@lists.xensource.com] Im Auftrag von Konrad Rzeszutek
Wilk
> Gesendet: Freitag, 25. November 2011 19:43
> An: Carsten Schiers
> Cc: xen-devel; konrad.wilk
> Betreff: Re: [Xen-devel] Load increase after memory upgrade (part2)
> 
> On Thu, Nov 24, 2011 at 01:28:44PM +0100, Carsten Schiers wrote:
> > Hello again, I would like to come back to that thing...sorry that I
did not have the time up to now.
> > 
> > ??
> > We (now) speak about
> > 
> > ??
> > *	Xen 4.1.2
> > *	Dom0 is Jeremy''s 2.6.32.46 64 bit
> > *	DomU in question is now 3.1.2 64 bit
> > *	Same thing if DomU is also 2.6.32.46
> > *	DomU owns two PCI cards (DVB-C) that o DMA
> > *	Machine has 8GB, Dom0 pinned at 512MB
> > 
> > ??
> > As compared to 2.6.34 Kernel with backported patches, the load on the
DomU is at least twice as high. It
> > 
> > will be "close to normal" if I reduce the memory used to
4GB.
> 
> That is in the dom0 or just in general on the machine?
> > 
> > ??
> > As you can see from the attachment, you once had an idea. So should we
try to find something...?
> 
> I think that was to instrument swiotlb to give an idea of how
> often it is called and basically have a matrix of its load. And
> from there figure out if the issue is that:
> 
>  1). The drivers allocoate/bounce/deallocate buffers on every interrupt
>     (bad, driver should be using some form of dma pool and most of the
>     ivtv do that)
> 
>  2). The buffers allocated to the drivers are above the 4GB and we end
>     up bouncing it needlessly. That can happen if the dom0 has most of
>     the precious memory under 4GB. However, that is usually not the case
>     as the domain isusually allocated from the top of the memory. The
>     fix for that was to set dom0_mem=max:XX. .. but with Dom0 kernels
>     before 3.1, the parameter would be ignored, so you had to use
>     ''mem=XX'' on the Linux command line as well.
> 
>  3). Where did you get the load values? Was it dom0? or domU?
> 
> 
> 
> > 
> > ??
> > Carsten.
> > ??
> > -----Urspr??ngliche Nachricht-----
> > An:konrad.wilk <konrad.wilk@oracle.com>; 
> > CC:linux <linux@eikelenboom.it>; xen-devel
<xen-devel@lists.xensource.com>;
> > Von:Carsten Schiers <carsten@schiers.de>
> > Gesendet:Mi 29.06.2011 23:17
> > Betreff:AW: Re: Re: Re: AW: Re: [Xen-devel] AW: Load increase after
memory upgrade?
> > > Lets first do the c) experiment as that will likely explain your
load average increase.
> > ...
> > > >c). If you want to see if the fault here lies in the bounce
buffer
> > > being used more
> > > >often in the DomU b/c you have 8GB of memory now and you end
up using
> > > more pages
> > > >past 4GB (in DomU), I can cook up a patch to figure this out.
But an
> > > easier way is
> > > >to just do (on the Xen hypervisor line): mem=4G and that will
make
> > > think you only have
> > > >4GB of physical RAM. ??If the load comes back to the normal
"amount"
> > > then the likely
> > > >culprit is that and we can think on how to fix this.
> > 
> > You are on the right track. Load was going down to "normal"
10% when reducing
> > Xen to 4GB by the parameter. Load seems to be still a little, little
bit lower
> > with Xenified Kernel (8-9%), but this is drastically lower than the
20% we had
> > before.
> 
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@lists.xensource.com
> > http://lists.xensource.com/xen-devel
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
>

Konrad Rzeszutek Wilk

2011-Nov-28 15:30 UTC

head link

Re: Load increase after memory upgrade (part2)

On Sat, Nov 26, 2011 at 10:14:08AM +0100, Carsten Schiers
wrote:> To add (read from some munin statistics I made over the time):
> 
>   - with load I mean the %CPU of xentop
>   - there is no change in CPU usage of the DomU or Dom0
Uhh, which matrix are using for that? CPU usage...? This is if you
change the DomU or the amount of memory the guest has? This is not
the load number (xentop value)?
>   - xenpm shows the core dedicated to that DomU is doing more work
> 
> Also I need to say that reduction to 4GB was performed by Xen parameter.
> 
> Carsten.
> 
> 
> -----Urspr?ngliche Nachricht-----
> Von: Konrad Rzeszutek Wilk [mailto:konrad@darnok.org] 
> Gesendet: Freitag, 25. November 2011 19:43
> An: Carsten Schiers
> Cc: konrad.wilk; xen-devel
> Betreff: Re: [Xen-devel] Load increase after memory upgrade (part2)
> 
> On Thu, Nov 24, 2011 at 01:28:44PM +0100, Carsten Schiers wrote:
> > Hello again, I would like to come back to that thing...sorry that I
did not have the time up to now.
> > 
> > ??
> > We (now) speak about
> > 
> > ??
> > *	Xen 4.1.2
> > *	Dom0 is Jeremy''s 2.6.32.46 64 bit
> > *	DomU in question is now 3.1.2 64 bit
> > *	Same thing if DomU is also 2.6.32.46
> > *	DomU owns two PCI cards (DVB-C) that o DMA
> > *	Machine has 8GB, Dom0 pinned at 512MB
> > 
> > ??
> > As compared to 2.6.34 Kernel with backported patches, the load on the
DomU is at least twice as high. It
> > 
> > will be "close to normal" if I reduce the memory used to
4GB.
> 
> That is in the dom0 or just in general on the machine?
> > 
> > ??
> > As you can see from the attachment, you once had an idea. So should we
try to find something...?
> 
> I think that was to instrument swiotlb to give an idea of how
> often it is called and basically have a matrix of its load. And
> from there figure out if the issue is that:
> 
>  1). The drivers allocoate/bounce/deallocate buffers on every interrupt
>     (bad, driver should be using some form of dma pool and most of the
>     ivtv do that)
> 
>  2). The buffers allocated to the drivers are above the 4GB and we end
>     up bouncing it needlessly. That can happen if the dom0 has most of
>     the precious memory under 4GB. However, that is usually not the case
>     as the domain isusually allocated from the top of the memory. The
>     fix for that was to set dom0_mem=max:XX. .. but with Dom0 kernels
>     before 3.1, the parameter would be ignored, so you had to use
>     ''mem=XX'' on the Linux command line as well.
> 
>  3). Where did you get the load values? Was it dom0? or domU?
> 
> 
> 
> > 
> > ??
> > Carsten.
> > ??
> > -----Urspr??ngliche Nachricht-----
> > An:konrad.wilk <konrad.wilk@oracle.com>; 
> > CC:linux <linux@eikelenboom.it>; xen-devel
<xen-devel@lists.xensource.com>;
> > Von:Carsten Schiers <carsten@schiers.de>
> > Gesendet:Mi 29.06.2011 23:17
> > Betreff:AW: Re: Re: Re: AW: Re: [Xen-devel] AW: Load increase after
memory upgrade?
> > > Lets first do the c) experiment as that will likely explain your
load average increase.
> > ...
> > > >c). If you want to see if the fault here lies in the bounce
buffer
> > > being used more
> > > >often in the DomU b/c you have 8GB of memory now and you end
up using
> > > more pages
> > > >past 4GB (in DomU), I can cook up a patch to figure this out.
But an
> > > easier way is
> > > >to just do (on the Xen hypervisor line): mem=4G and that will
make
> > > think you only have
> > > >4GB of physical RAM. ??If the load comes back to the normal
"amount"
> > > then the likely
> > > >culprit is that and we can think on how to fix this.
> > 
> > You are on the right track. Load was going down to "normal"
10% when reducing
> > Xen to 4GB by the parameter. Load seems to be still a little, little
bit lower
> > with Xenified Kernel (8-9%), but this is drastically lower than the
20% we had
> > before.
> 
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@lists.xensource.com
> > http://lists.xensource.com/xen-devel
> 
>

Ian Campbell

2011-Nov-28 15:40 UTC

head link

Re: Load increase after memory upgrade (part2)

On Mon, 2011-11-28 at 15:28 +0000, Konrad Rzeszutek Wilk
wrote:> On Fri, Nov 25, 2011 at 11:11:55PM +0100, Carsten Schiers wrote:
> > I looked through my old mails from you and you explained already the
necessity of double
> > bounce buffering (PCI->below 4GB->above 4GB). What I
don''t understand is: why does the
> > Xenified kernel not have this kind of issue?
> 
> That is a puzzle. It should not. The code is very much the same - both
> use the generic SWIOTLB which has not changed for years.
The swiotlb-xen used by classic-xen kernels (which I assume is what
Carsten means by "Xenified") isn''t exactly the same as the
stuff in
mainline Linux, it''s been heavily refactored for one thing.
It''s not
impossible that mainline is bouncing something it doesn''t really need
to.

It''s also possible that the dma mask of the device is different/wrong
in
mainline leading to such additional bouncing.

I guess it''s also possible that the classic-Xen kernels are playing
fast
and loose by not bouncing something they should (although if so they
appear to be getting away with it...) or that there is some difference
which really means mainline needs to bounce while classic-Xen doesn''t.

Ian.

Carsten Schiers

2011-Nov-28 15:52 UTC

head link

Re: Load increase after memory upgrade (part2)

Hi,

 
let me try to explain a bit more. Here you see the output of my xentop munin
graph for a

week. Only take a look at the bluish buckle. Notice the small step in front? So
it''s the CPU

permille used by the DomU that owns the cards. The small buckle is when I only
put in

one PCI card. Afterwards it''s constantly noticable higher load. See
that Dom0 (green) is

not impacted. I am back to the Xenified kernel, as you can see.

 

 
In the next picture you see the output of xenpm visualized. So this might be an
indicator that

realy something happens. It''s only the core that I dedicated to that
DomU. I have a three-core

AMD CPU by the way:

 

 
In CPU usage of the Dom0, there is nothing to see:

 

 
In CPU usage of the DomU, there is also not much to see, eventually a very
slight change of

mix:

 

 
There is a slight increase in sleaping jobs at the time slot in question, I
guess nothing we ca

directly map to the issue:

 

 
If you need other charts, I can try to produce them. 

 
BR,
Carsten.

 
-----Ursprüngliche Nachricht-----
An:Carsten Schiers <carsten@schiers.de>; zhenzhong.duan@oracle.com;
lersek@redhat.com;
CC:xen-devel <xen-devel@lists.xensource.com>; konrad.wilk
<konrad.wilk@oracle.com>;
Von:Konrad Rzeszutek Wilk <konrad@darnok.org>
Gesendet:Mo 28.11.2011 16:33
Betreff:Re: [Xen-devel] Load increase after memory upgrade (part2)
On Fri, Nov 25, 2011 at 11:11:55PM +0100, Carsten Schiers
wrote:> I got the values in DomU. I will have
> 
>   - aprox. 5% load in DomU with 2.6.34 Xenified Kernel
>   - aprox. 15% load in DomU with 2.6.32.46 Jeremy or 3.1.2 Kernel with one
card attached
>   - aprox. 30% load in DomU with 2.6.32.46 Jeremy or 3.1.2 Kernel with two
cards attached
HA!

I just wonder if the issue is that the reporting of CPU spent is wrong.
Laszlo Ersek and Zhenzhong Duan have both reported a bug in the pvops
code when it came to account of CPU time.
> 
> I looked through my old mails from you and you explained already the
necessity of double
> bounce buffering (PCI->below 4GB->above 4GB). What I don''t
understand is: why does the
> Xenified kernel not have this kind of issue?
That is a puzzle. It should not. The code is very much the same - both
use the generic SWIOTLB which has not changed for years.> 
> The driver in question is nearly identical between the two kernel versions.
It is in
> Drivers/media/dvb/ttpci by the way and when I understood the code right,
the allo in
> question is:
> 
>         /* allocate and init buffers */
>         av7110->debi_virt = pci_alloc_consistent(pdev, 8192,
&av7110->debi_bus);
Good. So it allocates it during init and uses it.>         if (!av7110->debi_virt)
>                 goto err_saa71466_vfree_4;
> 
> isn''t it? I think the cards are constantly transferring the stream
received through DMA.
Yeah, and that memory is set aside for the life of the driver. So there
should be no bounce buffering happening (as it allocated the memory
below the 4GB mark).> 
> I have set dom0_mem=512M by the way, shall I change that in some way?
Does the reporting (CPU usage of DomU) change in any way with
that?> 
> I can try out some things, if you want me to. But I have no idea what to do
and where to
> start, so I rely on your help...
> 
> Carsten.
> 
> -----Urspr?ngliche Nachricht-----
> Von: xen-devel-bounces@lists.xensource.com
[mailto:xen-devel-bounces@lists.xensource.com] Im Auftrag von Konrad Rzeszutek
Wilk
> Gesendet: Freitag, 25. November 2011 19:43
> An: Carsten Schiers
> Cc: xen-devel; konrad.wilk
> Betreff: Re: [Xen-devel] Load increase after memory upgrade (part2)
> 
> On Thu, Nov 24, 2011 at 01:28:44PM +0100, Carsten Schiers wrote:
> > Hello again, I would like to come back to that thing...sorry that I
did not have the time up to now.
> > 
> > ??
> > We (now) speak about
> > 
> > ??
> > *Xen 4.1.2
> > *Dom0 is Jeremy''s 2.6.32.46 64 bit
> > *DomU in question is now 3.1.2 64 bit
> > *Same thing if DomU is also 2.6.32.46
> > *DomU owns two PCI cards (DVB-C) that o DMA
> > *Machine has 8GB, Dom0 pinned at 512MB
> > 
> > ??
> > As compared to 2.6.34 Kernel with backported patches, the load on the
DomU is at least twice as high. It
> > 
> > will be "close to normal" if I reduce the memory used to
4GB.
> 
> That is in the dom0 or just in general on the machine?
> > 
> > ??
> > As you can see from the attachment, you once had an idea. So should we
try to find something...?
> 
> I think that was to instrument swiotlb to give an idea of how
> often it is called and basically have a matrix of its load. And
> from there figure out if the issue is that:
> 
>  1). The drivers allocoate/bounce/deallocate buffers on every interrupt
>     (bad, driver should be using some form of dma pool and most of the
>     ivtv do that)
> 
>  2). The buffers allocated to the drivers are above the 4GB and we end
>     up bouncing it needlessly. That can happen if the dom0 has most of
>     the precious memory under 4GB. However, that is usually not the case
>     as the domain isusually allocated from the top of the memory. The
>     fix for that was to set dom0_mem=max:XX. .. but with Dom0 kernels
>     before 3.1, the parameter would be ignored, so you had to use
>     ''mem=XX'' on the Linux command line as well.
> 
>  3). Where did you get the load values? Was it dom0? or domU?
> 
> 
> 
> > 
> > ??
> > Carsten.
> > ??
> > -----Urspr??ngliche Nachricht-----
> > An:konrad.wilk <konrad.wilk@oracle.com>; 
> > CC:linux <linux@eikelenboom.it>; xen-devel
<xen-devel@lists.xensource.com>;
> > Von:Carsten Schiers <carsten@schiers.de>
> > Gesendet:Mi 29.06.2011 23:17
> > Betreff:AW: Re: Re: Re: AW: Re: [Xen-devel] AW: Load increase after
memory upgrade?
> > > Lets first do the c) experiment as that will likely explain your
load average increase.
> > ...
> > > >c). If you want to see if the fault here lies in the bounce
buffer
> > > being used more
> > > >often in the DomU b/c you have 8GB of memory now and you end
up using
> > > more pages
> > > >past 4GB (in DomU), I can cook up a patch to figure this out.
But an
> > > easier way is
> > > >to just do (on the Xen hypervisor line): mem=4G and that will
make
> > > think you only have
> > > >4GB of physical RAM. ??If the load comes back to the normal
"amount"
> > > then the likely
> > > >culprit is that and we can think on how to fix this.
> > 
> > You are on the right track. Load was going down to "normal"
10% when reducing
> > Xen to 4GB by the parameter. Load seems to be still a little, little
bit lower
> > with Xenified Kernel (8-9%), but this is drastically lower than the
20% we had
> > before.
> 
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@lists.xensource.com
> > http://lists.xensource.com/xen-devel
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
> 
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Konrad Rzeszutek Wilk

2011-Nov-28 16:45 UTC

head link

Re: Load increase after memory upgrade (part2)

On Mon, Nov 28, 2011 at 03:40:13PM +0000, Ian Campbell
wrote:> On Mon, 2011-11-28 at 15:28 +0000, Konrad Rzeszutek Wilk wrote:
> > On Fri, Nov 25, 2011 at 11:11:55PM +0100, Carsten Schiers wrote:
> 
> > > I looked through my old mails from you and you explained already
the necessity of double
> > > bounce buffering (PCI->below 4GB->above 4GB). What I
don''t understand is: why does the
> > > Xenified kernel not have this kind of issue?
> > 
> > That is a puzzle. It should not. The code is very much the same - both
> > use the generic SWIOTLB which has not changed for years.
> 
> The swiotlb-xen used by classic-xen kernels (which I assume is what
> Carsten means by "Xenified") isn''t exactly the same as
the stuff in
> mainline Linux, it''s been heavily refactored for one thing.
It''s not
> impossible that mainline is bouncing something it doesn''t really
need
> to.
The usage, at least with ''pci_alloc_coherent'' is that there is
no bouncing
being done. The alloc_coherent will allocate a nice page, underneath the 4GB
mark and give it to the driver. The driver can use it as it wishes and there
is no need to bounce buffer.

But I can''t find the implementation of that in the classic Xen-SWIOTLB.
It looks
as if it is using map_single which would be taking the memory out of the
pool for a very long time, instead of allocating memory and
"swizzling" the MFNs.
[Note, I looked at the 2.6.18 hg tree for classic, the 2.6.34 is probably
improved much better so let me check that]

Carsten, let me prep up a patch that will print some diagnostic information
during the runtime - to see how often it does the bounce, the usage, etc..
> 
> It''s also possible that the dma mask of the device is
different/wrong in
> mainline leading to such additional bouncing.
If one were to use map_page and such - yes. But the alloc_coherent bypasses
that and ends up allocating it right under the 4GB (or rather it allocates
based on the dev->coherent_mask and swizzles the MFNs as required).
> 
> I guess it''s also possible that the classic-Xen kernels are
playing fast
> and loose by not bouncing something they should (although if so they
> appear to be getting away with it...) or that there is some difference
> which really means mainline needs to bounce while classic-Xen
doesn''t.
<nods> Could be very well.> 
> Ian.
>

Laszlo Ersek

2011-Nov-28 16:58 UTC

head link

Re: Load increase after memory upgrade (part2)

On 11/28/11 16:40, Ian Campbell wrote:> On Mon, 2011-11-28 at 15:28 +0000, Konrad Rzeszutek Wilk wrote:
>> On Fri, Nov 25, 2011 at 11:11:55PM +0100, Carsten Schiers wrote:
>
>>> I looked through my old mails from you and you explained already
the necessity of double
>>> bounce buffering (PCI->below 4GB->above 4GB). What I
don''t understand is: why does the
>>> Xenified kernel not have this kind of issue?
>>
>> That is a puzzle. It should not. The code is very much the same - both
>> use the generic SWIOTLB which has not changed for years.
>
> The swiotlb-xen used by classic-xen kernels (which I assume is what
> Carsten means by "Xenified") isn''t exactly the same as
the stuff in
> mainline Linux, it''s been heavily refactored for one thing.
It''s not
> impossible that mainline is bouncing something it doesn''t really
need
> to.
Please excuse me if I''m completely mistaken; my only point of reference
is that we recently had to backport 
<http://xenbits.xensource.com/hg/linux-2.6.18-xen.hg/rev/940>.
> It''s also possible that the dma mask of the device is
different/wrong in
> mainline leading to such additional bouncing.
dma_alloc_coherent() -- which I guess is the precursor of 
pci_alloc_consistent() -- asks xen_create_contiguous_region() to back 
the vaddr range with frames machine-addressible inside the device''s dma
mask. xen_create_contiguous_region() seems to land in a XENMEM_exchange 
hypercall (among others). Perhaps this extra layer of indirection allows 
the driver to use low pages directly, without bounce buffers.
> I guess it''s also possible that the classic-Xen kernels are
playing fast
> and loose by not bouncing something they should (although if so they
> appear to be getting away with it...) or that there is some difference
> which really means mainline needs to bounce while classic-Xen
doesn''t.
I''m sorry if what I just posted is painfully stupid. I''m
taking the risk
for the 1% chance that it could be helpful.

Wrt. the idle time accounting problem, after Niall''s two pings,
I''m also
waiting for a verdict, and/or for myself finding the time and fishing 
out the current patches.

Laszlo

Jan Beulich

2011-Nov-29 08:31 UTC

head link

Re: Load increase after memory upgrade (part2)

>>> On 28.11.11 at 17:45, Konrad Rzeszutek Wilk
<konrad.wilk@oracle.com> wrote:
> But I can''t find the implementation of that in the classic
Xen-SWIOTLB.
linux-2.6.18-xen.hg/arch/i386/kernel/pci-dma-xen.c:dma_alloc_coherent().

Jan

Carsten Schiers

2011-Nov-29 09:31 UTC

head link

Re: Load increase after memory upgrade (part2)

I attached the actualy used 2.6.34 file here, if that helps. BR,C.
 
-----Ursprüngliche Nachricht-----
An:Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>; 
CC:Ian Campbell <Ian.Campbell@citrix.com>; Konrad Rzeszutek Wilk
<konrad@darnok.org>; xen-devel <xen-devel@lists.xensource.com>;
zhenzhong.duan@oracle.com; lersek@redhat.com; Carsten Schiers
<carsten@schiers.de>;
Von:Jan Beulich <JBeulich@suse.com>
Gesendet:Di 29.11.2011 09:52
Betreff:Re: [Xen-devel] Load increase after memory upgrade
(part2)>>> On 28.11.11 at 17:45, Konrad Rzeszutek Wilk
<konrad.wilk@oracle.com> wrote:
> But I can''t find the implementation of that in the classic
Xen-SWIOTLB.
linux-2.6.18-xen.hg/arch/i386/kernel/pci-dma-xen.c:dma_alloc_coherent().

Jan




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Carsten Schiers

2011-Nov-29 09:37 UTC

head link

Re: Load increase after memory upgrade (part2)

The swiotlb-xen used by classic-xen kernels (which I assume is what
Carsten means by "Xenified") isn''t exactly the same as the
stuff in
mainline Linux, it''s been heavily refactored for one thing.
It''s not
impossible that mainline is bouncing something it doesn''t really need
to.

Yes, it''s a 2.6.34 kernel with Andrew Lyon''s backported
patches found here:

 
  http://code.google.com/p/gentoo-xen-kernel/downloads/list

 
GrC.



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Carsten Schiers

2011-Nov-29 09:42 UTC

head link

Re: Load increase after memory upgrade (part2)

 
>   - with load I mean the %CPU of xentop
>   - there is no change in CPU usage of the DomU or Dom0
Uhh, which matrix are using for that? CPU usage...? This is if you
change the DomU or the amount of memory the guest has? This is not
the load number (xentop value)?

I had a quick look into the munin plugin. It reads the output of "xm
li", Time in seconds and normalizes it.
But the effect is also visible in the CPU(%) column of xentop, if the DomU is on
higher load.

 
BR,C.



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Carsten Schiers

2011-Nov-29 09:46 UTC

head link

Re: Load increase after memory upgrade (part2)

Carsten, let me prep up a patch that will print some diagnostic information
during the runtime - to see how often it does the bounce, the usage, etc..

 
Jup, looking forward to implementing it. I can include them into any kernel.
2.6.18 would be

a bit difficult though, as the driver pack isn''t compatible any
longer...so I''d prefer 2.6.34 Xenified

vs. 3.1.2 pvops.

 
BR,C.



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Ian Campbell

2011-Nov-29 10:23 UTC

head link

Re: Load increase after memory upgrade (part2)

On Mon, 2011-11-28 at 16:45 +0000, Konrad Rzeszutek Wilk
wrote:> On Mon, Nov 28, 2011 at 03:40:13PM +0000, Ian Campbell wrote:
> > On Mon, 2011-11-28 at 15:28 +0000, Konrad Rzeszutek Wilk wrote:
> > > On Fri, Nov 25, 2011 at 11:11:55PM +0100, Carsten Schiers wrote:
> > 
> > > > I looked through my old mails from you and you explained
already the necessity of double
> > > > bounce buffering (PCI->below 4GB->above 4GB). What I
don''t understand is: why does the
> > > > Xenified kernel not have this kind of issue?
> > > 
> > > That is a puzzle. It should not. The code is very much the same -
both
> > > use the generic SWIOTLB which has not changed for years.
> > 
> > The swiotlb-xen used by classic-xen kernels (which I assume is what
> > Carsten means by "Xenified") isn''t exactly the same
as the stuff in
> > mainline Linux, it''s been heavily refactored for one thing.
It''s not
> > impossible that mainline is bouncing something it doesn''t
really need
> > to.
> 
> The usage, at least with ''pci_alloc_coherent'' is that
there is no bouncing
> being done. The alloc_coherent will allocate a nice page, underneath the
4GB
> mark and give it to the driver. The driver can use it as it wishes and
there
> is no need to bounce buffer.
Oh, I didn''t realise dma_alloc_coherent was part of swiotlb now. Only a
subset of swiotlb is in use then, all the bouncing stuff _should_ be
idle/unused -- but has that been confirmed?
> 
> But I can''t find the implementation of that in the classic
Xen-SWIOTLB. It looks
> as if it is using map_single which would be taking the memory out of the
> pool for a very long time, instead of allocating memory and
"swizzling" the MFNs.
> [Note, I looked at the 2.6.18 hg tree for classic, the 2.6.34 is probably
> improved much better so let me check that]
> 
> Carsten, let me prep up a patch that will print some diagnostic information
> during the runtime - to see how often it does the bounce, the usage, etc..
> 
> > 
> > It''s also possible that the dma mask of the device is
different/wrong in
> > mainline leading to such additional bouncing.
> 
> If one were to use map_page and such - yes. But the alloc_coherent bypasses
> that and ends up allocating it right under the 4GB (or rather it allocates
> based on the dev->coherent_mask and swizzles the MFNs as required).
> 
> > 
> > I guess it''s also possible that the classic-Xen kernels are
playing fast
> > and loose by not bouncing something they should (although if so they
> > appear to be getting away with it...) or that there is some difference
> > which really means mainline needs to bounce while classic-Xen
doesn''t.
> 
> <nods> Could be very well.
> > 
> > Ian.
> >

Konrad Rzeszutek Wilk

2011-Nov-29 15:33 UTC

head link

Re: Load increase after memory upgrade (part2)

On Tue, Nov 29, 2011 at 10:23:18AM +0000, Ian Campbell
wrote:> On Mon, 2011-11-28 at 16:45 +0000, Konrad Rzeszutek Wilk wrote:
> > On Mon, Nov 28, 2011 at 03:40:13PM +0000, Ian Campbell wrote:
> > > On Mon, 2011-11-28 at 15:28 +0000, Konrad Rzeszutek Wilk wrote:
> > > > On Fri, Nov 25, 2011 at 11:11:55PM +0100, Carsten Schiers
wrote:
> > > 
> > > > > I looked through my old mails from you and you
explained already the necessity of double
> > > > > bounce buffering (PCI->below 4GB->above 4GB).
What I don''t understand is: why does the
> > > > > Xenified kernel not have this kind of issue?
> > > > 
> > > > That is a puzzle. It should not. The code is very much the
same - both
> > > > use the generic SWIOTLB which has not changed for years.
> > > 
> > > The swiotlb-xen used by classic-xen kernels (which I assume is
what
> > > Carsten means by "Xenified") isn''t exactly the
same as the stuff in
> > > mainline Linux, it''s been heavily refactored for one
thing. It''s not
> > > impossible that mainline is bouncing something it
doesn''t really need
> > > to.
> > 
> > The usage, at least with ''pci_alloc_coherent'' is
that there is no bouncing
> > being done. The alloc_coherent will allocate a nice page, underneath
the 4GB
> > mark and give it to the driver. The driver can use it as it wishes and
there
> > is no need to bounce buffer.
> 
> Oh, I didn''t realise dma_alloc_coherent was part of swiotlb now.
Only a
> subset of swiotlb is in use then, all the bouncing stuff _should_ be
> idle/unused -- but has that been confirmed?
Nope. I hope that the diagnostic patch I have in mind will prove/disprove that.
Now I just need to find a moment to write it :-)

Konrad Rzeszutek Wilk

2011-Dec-02 15:23 UTC

head link

Re: Load increase after memory upgrade (part2)

> > > > > That is a puzzle. It should not. The code is very much
the same - both
> > > > > use the generic SWIOTLB which has not changed for
years.
> > > > 
> > > > The swiotlb-xen used by classic-xen kernels (which I assume
is what
> > > > Carsten means by "Xenified") isn''t
exactly the same as the stuff in
> > > > mainline Linux, it''s been heavily refactored for
one thing. It''s not
> > > > impossible that mainline is bouncing something it
doesn''t really need
> > > > to.
> > > 
> > > The usage, at least with ''pci_alloc_coherent''
is that there is no bouncing
> > > being done. The alloc_coherent will allocate a nice page,
underneath the 4GB
> > > mark and give it to the driver. The driver can use it as it
wishes and there
> > > is no need to bounce buffer.
> > 
> > Oh, I didn''t realise dma_alloc_coherent was part of swiotlb
now. Only a
> > subset of swiotlb is in use then, all the bouncing stuff _should_ be
> > idle/unused -- but has that been confirmed?
> 
> Nope. I hope that the diagnostic patch I have in mind will prove/disprove
that.
> Now I just need to find a moment to write it :-)
Done!

Carsten, can you please patch your kernel with this hacky patch and
when you have booted the new kernel, just do

modprobe dump_swiotlb

it should give an idea of how many bounces are happening, coherent
allocations, syncs, and so on.. along with the last driver that
did those operations.



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Carsten Schiers

2011-Dec-04 11:59 UTC

head link

Re: Load increase after memory upgrade (part2)

Thank you, Konrad.

I applied the patch to 3.1.2. In order to have a clear picture, I only enabled
one PCI card.
The result is:

[   28.028032] Starting SWIOTLB debug thread.
[   28.028076] swiotlb_start_thread: Go!
[   28.028622] xen_swiotlb_start_thread: Go!
[   33.028153] 0 [budget_av 0000:00:00.0] bounce: from:555352(slow:0)to:0
map:329 unmap:0 sync:555352
[   33.028294] SWIOTLB is 2% full
[   38.028178] 0 budget_av 0000:00:00.0 alloc coherent: 4, free: 0
[   38.028230] 0 [budget_av 0000:00:00.0] bounce: from:127981(slow:0)to:0 map:0
unmap:0 sync:127981
[   38.028352] SWIOTLB is 2% full
[   43.028170] 0 [budget_av 0000:00:00.0] bounce: from:128310(slow:0)to:0 map:0
unmap:0 sync:128310
[   43.028310] SWIOTLB is 2% full
[   48.028199] 0 [budget_av 0000:00:00.0] bounce: from:127981(slow:0)to:0 map:0
unmap:0 sync:127981
[   48.028334] SWIOTLB is 2% full
[   53.028170] 0 [budget_av 0000:00:00.0] bounce: from:128310(slow:0)to:0 map:0
unmap:0 sync:128310
[   53.028309] SWIOTLB is 2% full
[   58.028138] 0 [budget_av 0000:00:00.0] bounce: from:126994(slow:0)to:0 map:0
unmap:0 sync:126994
[   58.028195] SWIOTLB is 2% full
[   63.028170] 0 [budget_av 0000:00:00.0] bounce: from:121401(slow:0)to:0 map:0
unmap:0 sync:121401
[   63.029560] SWIOTLB is 2% full
[   68.028193] 0 [budget_av 0000:00:00.0] bounce: from:127981(slow:0)to:0 map:0
unmap:0 sync:127981
[   68.028329] SWIOTLB is 2% full
[   73.028104] 0 [budget_av 0000:00:00.0] bounce: from:122717(slow:0)to:0 map:0
unmap:0 sync:122717
[   73.028244] SWIOTLB is 2% full
[   78.028191] 0 [budget_av 0000:00:00.0] bounce: from:127981(slow:0)to:0 map:0
unmap:0 sync:127981
[   78.028331] SWIOTLB is 2% full
[   83.028112] 0 [budget_av 0000:00:00.0] bounce: from:128310(slow:0)to:0 map:0
unmap:0 sync:128310
[   83.028171] SWIOTLB is 2% full

Was that long enough? I hope this helps.

Carsten.

-----Ursprüngliche Nachricht-----
Von: Konrad Rzeszutek Wilk [mailto:konrad@darnok.org] 
Gesendet: Freitag, 2. Dezember 2011 16:24
An: Konrad Rzeszutek Wilk
Cc: Ian Campbell; xen-devel; Carsten Schiers; zhenzhong.duan@oracle.com;
lersek@redhat.com
Betreff: Re: [Xen-devel] Load increase after memory upgrade (part2)
> > > > > That is a puzzle. It should not. The code is very much
the same - both
> > > > > use the generic SWIOTLB which has not changed for
years.
> > > > 
> > > > The swiotlb-xen used by classic-xen kernels (which I assume
is what
> > > > Carsten means by "Xenified") isn''t
exactly the same as the stuff in
> > > > mainline Linux, it''s been heavily refactored for
one thing. It''s not
> > > > impossible that mainline is bouncing something it
doesn''t really need
> > > > to.
> > > 
> > > The usage, at least with ''pci_alloc_coherent''
is that there is no bouncing
> > > being done. The alloc_coherent will allocate a nice page,
underneath the 4GB
> > > mark and give it to the driver. The driver can use it as it
wishes and there
> > > is no need to bounce buffer.
> > 
> > Oh, I didn''t realise dma_alloc_coherent was part of swiotlb
now. Only a
> > subset of swiotlb is in use then, all the bouncing stuff _should_ be
> > idle/unused -- but has that been confirmed?
> 
> Nope. I hope that the diagnostic patch I have in mind will prove/disprove
that.
> Now I just need to find a moment to write it :-)
Done!

Carsten, can you please patch your kernel with this hacky patch and
when you have booted the new kernel, just do

modprobe dump_swiotlb

it should give an idea of how many bounces are happening, coherent
allocations, syncs, and so on.. along with the last driver that
did those operations.

Carsten Schiers

2011-Dec-04 12:09 UTC

head link

Re: Load increase after memory upgrade (part2)

Here with two cards enabled and creating a bit "work" by watching TV
with one oft hem:

[   23.842720] Starting SWIOTLB debug thread.
[   23.842750] swiotlb_start_thread: Go!
[   23.842838] xen_swiotlb_start_thread: Go!
[   28.841451] 0 [budget_av 0000:00:01.0] bounce: from:435596(slow:0)to:0
map:658 unmap:0 sync:435596
[   28.841592] SWIOTLB is 4% full
[   33.840147] 0 [budget_av 0000:00:01.0] bounce: from:127652(slow:0)to:0 map:0
unmap:0 sync:127652
[   33.840283] SWIOTLB is 4% full
[   33.844222] 0 budget_av 0000:00:01.0 alloc coherent: 8, free: 0
[   38.840227] 0 [budget_av 0000:00:01.0] bounce: from:128310(slow:0)to:0 map:0
unmap:0 sync:128310
[   38.840361] SWIOTLB is 4% full
[   43.840182] 0 [budget_av 0000:00:01.0] bounce: from:128310(slow:0)to:0 map:0
unmap:0 sync:128310
[   43.840323] SWIOTLB is 4% full
[   48.840094] 0 [budget_av 0000:00:01.0] bounce: from:127652(slow:0)to:0 map:0
unmap:0 sync:127652
[   48.840154] SWIOTLB is 4% full
[   53.840160] 0 [budget_av 0000:00:01.0] bounce: from:119756(slow:0)to:0 map:0
unmap:0 sync:119756
[   53.840301] SWIOTLB is 4% full
[   58.840202] 0 [budget_av 0000:00:01.0] bounce: from:128310(slow:0)to:0 map:0
unmap:0 sync:128310
[   58.840339] SWIOTLB is 4% full
[   63.840626] 0 [budget_av 0000:00:01.0] bounce: from:128310(slow:0)to:0 map:0
unmap:0 sync:128310
[   63.840686] SWIOTLB is 4% full
[   68.840122] 0 [budget_av 0000:00:01.0] bounce: from:127323(slow:0)to:0 map:0
unmap:0 sync:127323
[   68.840180] SWIOTLB is 4% full
[   73.840647] 0 [budget_av 0000:00:01.0] bounce: from:211547(slow:0)to:0 map:0
unmap:0 sync:211547
[   73.840784] SWIOTLB is 4% full
[   78.840204] 0 [budget_av 0000:00:01.0] bounce: from:255962(slow:0)to:0 map:0
unmap:0 sync:255962
[   78.840344] SWIOTLB is 4% full
[   83.840114] 0 [budget_av 0000:00:01.0] bounce: from:255304(slow:0)to:0 map:0
unmap:0 sync:255304
[   83.840178] SWIOTLB is 4% full
[   88.840158] 0 [budget_av 0000:00:01.0] bounce: from:256620(slow:0)to:0 map:0
unmap:0 sync:256620
[   88.840302] SWIOTLB is 4% full
[   93.840185] 0 [budget_av 0000:00:00.0] bounce: from:250040(slow:0)to:0 map:0
unmap:0 sync:250040
[   93.840319] SWIOTLB is 4% full
[   98.840181] 0 [budget_av 0000:00:00.0] bounce: from:255962(slow:0)to:0 map:0
unmap:0 sync:255962
[   98.841563] SWIOTLB is 4% full
[  103.841221] 0 [budget_av 0000:00:00.0] bounce: from:255962(slow:0)to:0 map:0
unmap:0 sync:255962
[  103.841361] SWIOTLB is 4% full
[  108.840247] 0 [budget_av 0000:00:00.0] bounce: from:255962(slow:0)to:0 map:0
unmap:0 sync:255962
[  108.840389] SWIOTLB is 4% full
[  113.840157] 0 [budget_av 0000:00:00.0] bounce: from:261555(slow:0)to:0 map:0
unmap:0 sync:261555
[  113.840298] SWIOTLB is 4% full
[  118.840119] 0 [budget_av 0000:00:00.0] bounce: from:295442(slow:0)to:0 map:0
unmap:0 sync:295442
[  118.840259] SWIOTLB is 4% full
[  123.841025] 0 [budget_av 0000:00:00.0] bounce: from:295113(slow:0)to:0 map:0
unmap:0 sync:295113
[  123.841164] SWIOTLB is 4% full
[  128.840175] 0 [budget_av 0000:00:00.0] bounce: from:294784(slow:0)to:0 map:0
unmap:0 sync:294784
[  128.840310] SWIOTLB is 4% full
[  133.840194] 0 [budget_av 0000:00:00.0] bounce: from:293797(slow:0)to:0 map:0
unmap:0 sync:293797
[  133.840330] SWIOTLB is 4% full
[  138.840498] 0 [budget_av 0000:00:00.0] bounce: from:341502(slow:0)to:0 map:0
unmap:0 sync:341502
[  138.840637] SWIOTLB is 4% full
[  143.840173] 0 [budget_av 0000:00:00.0] bounce: from:341502(slow:0)to:0 map:0
unmap:0 sync:341502
[  143.840313] SWIOTLB is 4% full
[  148.840215] 0 [budget_av 0000:00:00.0] bounce: from:341831(slow:0)to:0 map:0
unmap:0 sync:341831
[  148.840355] SWIOTLB is 4% full
[  153.840205] 0 [budget_av 0000:00:01.0] bounce: from:329658(slow:0)to:0 map:0
unmap:0 sync:329658
[  153.840341] SWIOTLB is 4% full
[  158.840137] 0 [budget_av 0000:00:00.0] bounce: from:342160(slow:0)to:0 map:0
unmap:0 sync:342160
[  158.840277] SWIOTLB is 4% full
[  163.841288] 0 [budget_av 0000:00:00.0] bounce: from:341502(slow:0)to:0 map:0
unmap:0 sync:341502
[  163.841424] SWIOTLB is 4% full
[  168.840198] 0 [budget_av 0000:00:00.0] bounce: from:341502(slow:0)to:0 map:0
unmap:0 sync:341502
[  168.840339] SWIOTLB is 4% full
[  173.840167] 0 [budget_av 0000:00:00.0] bounce: from:341502(slow:0)to:0 map:0
unmap:0 sync:341502
[  173.840304] SWIOTLB is 4% full
[  178.840184] 0 [budget_av 0000:00:00.0] bounce: from:328013(slow:0)to:0 map:0
unmap:0 sync:328013
[  178.840324] SWIOTLB is 4% full
[  183.840129] 0 [budget_av 0000:00:00.0] bounce: from:341831(slow:0)to:0 map:0
unmap:0 sync:341831
[  183.840269] SWIOTLB is 4% full
[  188.840123] 0 [budget_av 0000:00:01.0] bounce: from:340515(slow:0)to:0 map:0
unmap:0 sync:340515
[  188.841647] SWIOTLB is 4% full
[  193.840192] 0 [budget_av 0000:00:00.0] bounce: from:338541(slow:0)to:0 map:0
unmap:0 sync:338541
[  193.840329] SWIOTLB is 4% full
[  198.840148] 0 [budget_av 0000:00:01.0] bounce: from:330316(slow:0)to:0 map:0
unmap:0 sync:330316
[  198.840230] SWIOTLB is 4% full
[  203.840860] 0 [budget_av 0000:00:00.0] bounce: from:341831(slow:0)to:0 map:0
unmap:0 sync:341831
[  203.841000] SWIOTLB is 4% full
[  208.840562] 0 [budget_av 0000:00:01.0] bounce: from:337883(slow:0)to:0 map:0
unmap:0 sync:337883
[  208.840698] SWIOTLB is 4% full
[  213.840171] 0 [budget_av 0000:00:00.0] bounce: from:341502(slow:0)to:0 map:0
unmap:0 sync:341502
[  213.840311] SWIOTLB is 4% full
[  218.840214] 0 [budget_av 0000:00:01.0] bounce: from:320117(slow:0)to:0 map:0
unmap:0 sync:320117
[  218.840354] SWIOTLB is 4% full
[  223.840238] 0 [budget_av 0000:00:01.0] bounce: from:299390(slow:0)to:0 map:0
unmap:0 sync:299390
[  223.840373] SWIOTLB is 4% full
[  228.841415] 0 [budget_av 0000:00:01.0] bounce: from:298732(slow:0)to:0 map:0
unmap:0 sync:298732
[  228.841560] SWIOTLB is 4% full
[  233.840705] 0 [budget_av 0000:00:00.0] bounce: from:299061(slow:0)to:0 map:0
unmap:0 sync:299061
[  233.840844] SWIOTLB is 4% full
[  238.840145] 0 [budget_av 0000:00:01.0] bounce: from:293468(slow:0)to:0 map:0
unmap:0 sync:293468
[  238.840280] SWIOTLB is 4% full

-----Ursprüngliche Nachricht-----
Von: Konrad Rzeszutek Wilk [mailto:konrad@darnok.org] 
Gesendet: Freitag, 2. Dezember 2011 16:24
An: Konrad Rzeszutek Wilk
Cc: Ian Campbell; xen-devel; Carsten Schiers; zhenzhong.duan@oracle.com;
lersek@redhat.com
Betreff: Re: [Xen-devel] Load increase after memory upgrade (part2)
> > > > > That is a puzzle. It should not. The code is very much
the same - both
> > > > > use the generic SWIOTLB which has not changed for
years.
> > > > 
> > > > The swiotlb-xen used by classic-xen kernels (which I assume
is what
> > > > Carsten means by "Xenified") isn''t
exactly the same as the stuff in
> > > > mainline Linux, it''s been heavily refactored for
one thing. It''s not
> > > > impossible that mainline is bouncing something it
doesn''t really need
> > > > to.
> > > 
> > > The usage, at least with ''pci_alloc_coherent''
is that there is no bouncing
> > > being done. The alloc_coherent will allocate a nice page,
underneath the 4GB
> > > mark and give it to the driver. The driver can use it as it
wishes and there
> > > is no need to bounce buffer.
> > 
> > Oh, I didn''t realise dma_alloc_coherent was part of swiotlb
now. Only a
> > subset of swiotlb is in use then, all the bouncing stuff _should_ be
> > idle/unused -- but has that been confirmed?
> 
> Nope. I hope that the diagnostic patch I have in mind will prove/disprove
that.
> Now I just need to find a moment to write it :-)
Done!

Carsten, can you please patch your kernel with this hacky patch and
when you have booted the new kernel, just do

modprobe dump_swiotlb

it should give an idea of how many bounces are happening, coherent
allocations, syncs, and so on.. along with the last driver that
did those operations.

Carsten Schiers

2011-Dec-04 12:18 UTC

head link

Re: Load increase after memory upgrade (part2)

Should eventually mention that I create the DomU with only the parameter
iommu=soft. I hope
Nothing more is required. For Xenified, it''s swiotlb=32,force.

Carsten.

-----Ursprüngliche Nachricht-----
Von: Konrad Rzeszutek Wilk [mailto:konrad@darnok.org] 
Gesendet: Freitag, 2. Dezember 2011 16:24
An: Konrad Rzeszutek Wilk
Cc: Ian Campbell; xen-devel; Carsten Schiers; zhenzhong.duan@oracle.com;
lersek@redhat.com
Betreff: Re: [Xen-devel] Load increase after memory upgrade (part2)
> > > > > That is a puzzle. It should not. The code is very much
the same - both
> > > > > use the generic SWIOTLB which has not changed for
years.
> > > > 
> > > > The swiotlb-xen used by classic-xen kernels (which I assume
is what
> > > > Carsten means by "Xenified") isn''t
exactly the same as the stuff in
> > > > mainline Linux, it''s been heavily refactored for
one thing. It''s not
> > > > impossible that mainline is bouncing something it
doesn''t really need
> > > > to.
> > > 
> > > The usage, at least with ''pci_alloc_coherent''
is that there is no bouncing
> > > being done. The alloc_coherent will allocate a nice page,
underneath the 4GB
> > > mark and give it to the driver. The driver can use it as it
wishes and there
> > > is no need to bounce buffer.
> > 
> > Oh, I didn''t realise dma_alloc_coherent was part of swiotlb
now. Only a
> > subset of swiotlb is in use then, all the bouncing stuff _should_ be
> > idle/unused -- but has that been confirmed?
> 
> Nope. I hope that the diagnostic patch I have in mind will prove/disprove
that.
> Now I just need to find a moment to write it :-)
Done!

Carsten, can you please patch your kernel with this hacky patch and
when you have booted the new kernel, just do

modprobe dump_swiotlb

it should give an idea of how many bounces are happening, coherent
allocations, syncs, and so on.. along with the last driver that
did those operations.

Konrad Rzeszutek Wilk

2011-Dec-06 03:26 UTC

head link

Re: Load increase after memory upgrade (part2)

On Sun, Dec 04, 2011 at 01:09:28PM +0100, Carsten Schiers
wrote:> Here with two cards enabled and creating a bit "work" by watching
TV with one oft hem:
> 
> [   23.842720] Starting SWIOTLB debug thread.
> [   23.842750] swiotlb_start_thread: Go!
> [   23.842838] xen_swiotlb_start_thread: Go!
> [   28.841451] 0 [budget_av 0000:00:01.0] bounce: from:435596(slow:0)to:0
map:658 unmap:0 sync:435596
> [   28.841592] SWIOTLB is 4% full
> [   33.840147] 0 [budget_av 0000:00:01.0] bounce: from:127652(slow:0)to:0
map:0 unmap:0 sync:127652
> [   33.840283] SWIOTLB is 4% full
> [   33.844222] 0 budget_av 0000:00:01.0 alloc coherent: 8, free: 0
> [   38.840227] 0 [budget_av 0000:00:01.0] bounce: from:128310(slow:0)to:0
map:0 unmap:0 sync:128310
Whoa. Yes. You are definitly using the bounce buffer :-)

Now it is time to look at why the drive is not using those coherent ones - it
looks to allocate just eight of them but does not use them.. Unless it is
using them _and_ bouncing them (which would be odd).

And BTW, you can lower your ''swiotlb=XX'' value.  The 4% is how
much you
are using of the default size.

I should find out_why_ the old Xen kernels do not use the bounce buffer
so much...

Konrad Rzeszutek Wilk

2011-Dec-14 20:23 UTC

head link

Re: Load increase after memory upgrade (part2)

On Mon, Dec 05, 2011 at 10:26:21PM -0500, Konrad Rzeszutek Wilk
wrote:> On Sun, Dec 04, 2011 at 01:09:28PM +0100, Carsten Schiers wrote:
> > Here with two cards enabled and creating a bit "work" by
watching TV with one oft hem:
> > 
> > [   23.842720] Starting SWIOTLB debug thread.
> > [   23.842750] swiotlb_start_thread: Go!
> > [   23.842838] xen_swiotlb_start_thread: Go!
> > [   28.841451] 0 [budget_av 0000:00:01.0] bounce:
from:435596(slow:0)to:0 map:658 unmap:0 sync:435596
> > [   28.841592] SWIOTLB is 4% full
> > [   33.840147] 0 [budget_av 0000:00:01.0] bounce:
from:127652(slow:0)to:0 map:0 unmap:0 sync:127652
> > [   33.840283] SWIOTLB is 4% full
> > [   33.844222] 0 budget_av 0000:00:01.0 alloc coherent: 8, free: 0
> > [   38.840227] 0 [budget_av 0000:00:01.0] bounce:
from:128310(slow:0)to:0 map:0 unmap:0 sync:128310
> 
> Whoa. Yes. You are definitly using the bounce buffer :-)
> 
> Now it is time to look at why the drive is not using those coherent ones -
it
> looks to allocate just eight of them but does not use them.. Unless it is
> using them _and_ bouncing them (which would be odd).
> 
> And BTW, you can lower your ''swiotlb=XX'' value.  The 4%
is how much you
> are using of the default size.
So I able to see this with an atl1c ethernet driver on my SandyBridge i3
box. It looks as if the card is truly 32-bit so on a box with 8GB it
bounces the data. If I booted the Xen hypervisor with
''mem=4GB'' I get no
bounces (no surprise there).

In other words - I see the same behavior you are seeing. Now off
to:> 
> I should find out_why_ the old Xen kernels do not use the bounce buffer
> so much...
which will require some fiddling around.

Konrad Rzeszutek Wilk

2011-Dec-14 22:07 UTC

head link

Re: Load increase after memory upgrade (part2)

On Wed, Dec 14, 2011 at 04:23:51PM -0400, Konrad Rzeszutek Wilk
wrote:> On Mon, Dec 05, 2011 at 10:26:21PM -0500, Konrad Rzeszutek Wilk wrote:
> > On Sun, Dec 04, 2011 at 01:09:28PM +0100, Carsten Schiers wrote:
> > > Here with two cards enabled and creating a bit "work"
by watching TV with one oft hem:
> > > 
> > > [   23.842720] Starting SWIOTLB debug thread.
> > > [   23.842750] swiotlb_start_thread: Go!
> > > [   23.842838] xen_swiotlb_start_thread: Go!
> > > [   28.841451] 0 [budget_av 0000:00:01.0] bounce:
from:435596(slow:0)to:0 map:658 unmap:0 sync:435596
> > > [   28.841592] SWIOTLB is 4% full
> > > [   33.840147] 0 [budget_av 0000:00:01.0] bounce:
from:127652(slow:0)to:0 map:0 unmap:0 sync:127652
> > > [   33.840283] SWIOTLB is 4% full
> > > [   33.844222] 0 budget_av 0000:00:01.0 alloc coherent: 8, free:
0
> > > [   38.840227] 0 [budget_av 0000:00:01.0] bounce:
from:128310(slow:0)to:0 map:0 unmap:0 sync:128310
> > 
> > Whoa. Yes. You are definitly using the bounce buffer :-)
> > 
> > Now it is time to look at why the drive is not using those coherent
ones - it
> > looks to allocate just eight of them but does not use them.. Unless it
is
> > using them _and_ bouncing them (which would be odd).
> > 
> > And BTW, you can lower your ''swiotlb=XX'' value.  The
4% is how much you
> > are using of the default size.
> 
> So I able to see this with an atl1c ethernet driver on my SandyBridge i3
> box. It looks as if the card is truly 32-bit so on a box with 8GB it
> bounces the data. If I booted the Xen hypervisor with
''mem=4GB'' I get no
> bounces (no surprise there).
> 
> In other words - I see the same behavior you are seeing. Now off to:
> > 
> > I should find out_why_ the old Xen kernels do not use the bounce
buffer
> > so much...
> 
> which will require some fiddling around.
And I am not seeing any difference - the swiotlb is used with the same usage
when
booting a classic (old style XEnoLinux) 2.6.32 vs using a brand new pvops (3.2).
Obviously if I limit the physical amount of memory (so
''mem=4GB'' on Xen hypervisor
line), the bounce usage disappears. Hmm, I wonder if there is a nice way to 
tell the hypervisor - hey, please stuff dom0 under 4GB.

Here is the patch I used against classic XenLinux. Any chance you could run
it with your classis guests and see what numbers you get?




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Carsten Schiers

2011-Dec-15 14:52 UTC

head link

Re: Load increase after memory upgrade (part2)

...
> which will require some fiddling around.
Here is the patch I used against classic XenLinux. Any chance you could run
it with your classis guests and see what numbers you get?

Sure, it might take a bit, but I''ll try it with my 2.6.34 classic
kernel.

 
Carsten.



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Carsten Schiers

2011-Dec-16 14:56 UTC

head link

Re: Load increase after memory upgrade (part2)

Well, it will do nothing but print out “SWIOTLB is 0% full”.

 
Does that help? Or do you think something went wrong with the patch…

 
BR,

Carsten.

 
 
 
Von: Carsten Schiers 
Gesendet: Donnerstag, 15. Dezember 2011 15:53
An: Konrad Rzeszutek Wilk; Konrad Rzeszutek Wilk
Cc: linux@eikelenboom.it; zhenzhong.duan@oracle.com; Ian Campbell;
lersek@redhat.com; xen-devel
Betreff: AW: [Xen-devel] Load increase after memory upgrade (part2)

 
...
> which will require some fiddling around.
Here is the patch I used against classic XenLinux. Any chance you could run
it with your classis guests and see what numbers you get?

Sure, it might take a bit, but I''ll try it with my 2.6.34 classic
kernel.

 
Carsten.



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Konrad Rzeszutek Wilk

2011-Dec-16 15:04 UTC

head link

Re: Load increase after memory upgrade (part2)

On Fri, Dec 16, 2011 at 03:56:10PM +0100, Carsten Schiers
wrote:> Well, it will do nothing but print out “SWIOTLB is 0% full”.
> 
>  
> Does that help? Or do you think something went wrong with the patch…
> 
And you are using swiotlb=force on the 2.6.34 classic kernel and passing
in your budget-av card in it? Could you append the dmesg output please?


Thanks.>  
> BR,
> 
> Carsten.
> 
>  
>  
>  
> Von: Carsten Schiers 
> Gesendet: Donnerstag, 15. Dezember 2011 15:53
> An: Konrad Rzeszutek Wilk; Konrad Rzeszutek Wilk
> Cc: linux@eikelenboom.it; zhenzhong.duan@oracle.com; Ian Campbell;
lersek@redhat.com; xen-devel
> Betreff: AW: [Xen-devel] Load increase after memory upgrade (part2)
> 
>  
> ...
> 
> > which will require some fiddling around.
> 
> Here is the patch I used against classic XenLinux. Any chance you could run
> it with your classis guests and see what numbers you get?
> 
> Sure, it might take a bit, but I'll try it with my 2.6.34 classic
kernel.
> 
>  
> Carsten.
> 
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Carsten Schiers

2011-Dec-16 15:51 UTC

head link

Re: Load increase after memory upgrade (part2)

> And you are using swiotlb=force on the 2.6.34 classic kernel and passing in
your budget-av card in it?
Yes, two of them with swiotlb=32,force.

> Could you append the dmesg output please?
Attached. You find a "normal" boot after the one with the patched
kernel.

Carsten.




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Konrad Rzeszutek Wilk

2011-Dec-16 16:19 UTC

head link

Re: Load increase after memory upgrade (part2)

On Fri, Dec 16, 2011 at 04:51:47PM +0100, Carsten Schiers
wrote:> > And you are using swiotlb=force on the 2.6.34 classic kernel and
passing in your budget-av card in it?
> 
> Yes, two of them with swiotlb=32,force.
> 
> 
> > Could you append the dmesg output please?
> 
> Attached. You find a "normal" boot after the one with the patched
kernel.
Uh, what happens when you run the driver, meaning capture stuff. I remember with
the pvops you had about ~30K or so of bounces, but not sure about the bootup?

Thanks for being willing to be a guinea pig while trying to fix
this.> 
> Carsten.
> 
>

Carsten Schiers

2011-Dec-17 22:12 UTC

head link

Re: Load increase after memory upgrade (part2)

OK, double checked. Both PCI cards enabled, running, working, but nothing but
"SWIOTLB is 0% full". Any chance
to check that the patch is working? Does it print out something else with your
setting? BR, Carsten.

-----Ursprüngliche Nachricht-----
Von: xen-devel-bounces@lists.xensource.com
[mailto:xen-devel-bounces@lists.xensource.com] Im Auftrag von Konrad Rzeszutek
Wilk
Gesendet: Freitag, 16. Dezember 2011 17:19
An: Carsten Schiers
Cc: linux@eikelenboom.it; xen-devel; lersek@redhat.com;
zhenzhong.duan@oracle.com; Ian Campbell
Betreff: Re: [Xen-devel] Load increase after memory upgrade (part2)

On Fri, Dec 16, 2011 at 04:51:47PM +0100, Carsten Schiers
wrote:> > And you are using swiotlb=force on the 2.6.34 classic kernel and
passing in your budget-av card in it?
> 
> Yes, two of them with swiotlb=32,force.
> 
> 
> > Could you append the dmesg output please?
> 
> Attached. You find a "normal" boot after the one with the patched
kernel.
Uh, what happens when you run the driver, meaning capture stuff. I remember with
the pvops you had about ~30K or so of bounces, but not sure about the bootup?

Thanks for being willing to be a guinea pig while trying to fix
this.> 
> Carsten.
> 
> 


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Sander Eikelenboom

2011-Dec-18 00:19 UTC

head link

Re: Load increase after memory upgrade (part2)

I also have done some experiments with the patch, in domU i also get the 0% full
for my usb controllers with video grabbers , in dom0 my i get 12% full, both my
realtek 8169 ethernet controllers seem to use the bounce buffering ...
And that with a iommu (amd) ? it all seems kind of strange, although it is also
working ...
I''m not having much time now, hoping to get back with a full report
soon.

--
Sander

Saturday, December 17, 2011, 11:12:45 PM, you wrote:
> OK, double checked. Both PCI cards enabled, running, working, but nothing
but "SWIOTLB is 0% full". Any chance
> to check that the patch is working? Does it print out something else with
your setting? BR, Carsten.
> -----Ursprüngliche Nachricht-----
> Von: xen-devel-bounces@lists.xensource.com
[mailto:xen-devel-bounces@lists.xensource.com] Im Auftrag von Konrad Rzeszutek
Wilk
> Gesendet: Freitag, 16. Dezember 2011 17:19
> An: Carsten Schiers
> Cc: linux@eikelenboom.it; xen-devel; lersek@redhat.com;
zhenzhong.duan@oracle.com; Ian Campbell
> Betreff: Re: [Xen-devel] Load increase after memory upgrade (part2)
> On Fri, Dec 16, 2011 at 04:51:47PM +0100, Carsten Schiers wrote:
>> > And you are using swiotlb=force on the 2.6.34 classic kernel and
passing in your budget-av card in it?
>> 
>> Yes, two of them with swiotlb=32,force.
>> 
>> 
>> > Could you append the dmesg output please?
>> 
>> Attached. You find a "normal" boot after the one with the
patched kernel.
> Uh, what happens when you run the driver, meaning capture stuff. I remember
with the pvops you had about ~30K or so of bounces, but not sure about the
bootup?
> Thanks for being willing to be a guinea pig while trying to fix this.
>> 
>> Carsten.
>> 
>> 

> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel



-- 
Best regards,
 Sander                            mailto:linux@eikelenboom.it

Konrad Rzeszutek Wilk

2011-Dec-19 14:54 UTC

head link

Re: Load increase after memory upgrade (part2)

On Sat, Dec 17, 2011 at 11:12:45PM +0100, Carsten Schiers
wrote:> OK, double checked. Both PCI cards enabled, running, working, but nothing
but "SWIOTLB is 0% full". Any chance
> to check that the patch is working? Does it print out something else with
your setting? BR, Carsten.
Hm, and with the pvops you got some numbers along with tons of
''bounce''.

The one thing that I neglected in this patch is the alloc_coherent
part.. which I don''t thing is that important as we did show that the
alloc buffers are used.

I don''t have anything concrete yet, but after the holidays should have
a
better idea of what is happening. Thanks for  being willing to test
this!> 
> -----Urspr?ngliche Nachricht-----
> Von: xen-devel-bounces@lists.xensource.com
[mailto:xen-devel-bounces@lists.xensource.com] Im Auftrag von Konrad Rzeszutek
Wilk
> Gesendet: Freitag, 16. Dezember 2011 17:19
> An: Carsten Schiers
> Cc: linux@eikelenboom.it; xen-devel; lersek@redhat.com;
zhenzhong.duan@oracle.com; Ian Campbell
> Betreff: Re: [Xen-devel] Load increase after memory upgrade (part2)
> 
> On Fri, Dec 16, 2011 at 04:51:47PM +0100, Carsten Schiers wrote:
> > > And you are using swiotlb=force on the 2.6.34 classic kernel and
passing in your budget-av card in it?
> > 
> > Yes, two of them with swiotlb=32,force.
> > 
> > 
> > > Could you append the dmesg output please?
> > 
> > Attached. You find a "normal" boot after the one with the
patched kernel.
> 
> Uh, what happens when you run the driver, meaning capture stuff. I remember
with the pvops you had about ~30K or so of bounces, but not sure about the
bootup?
> 
> Thanks for being willing to be a guinea pig while trying to fix this.
> > 
> > Carsten.
> > 
> > 
> 
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

Konrad Rzeszutek Wilk

2011-Dec-19 14:56 UTC

head link

Re: Load increase after memory upgrade (part2)

On Sun, Dec 18, 2011 at 01:19:16AM +0100, Sander Eikelenboom
wrote:> I also have done some experiments with the patch, in domU i also get the 0%
full for my usb controllers with video grabbers , in dom0 my i get 12% full,
both my realtek 8169 ethernet controllers seem to use the bounce buffering ...
> And that with a iommu (amd) ? it all seems kind of strange, although it is
also working ...
> I''m not having much time now, hoping to get back with a full
report soon.
Hm, so domU nothing, but dom0 it reports. Maybe the patch is incorrect
when running as PV guest .. Will look in more details after the
holidays. Thanks for being willing to try it out.

Konrad Rzeszutek Wilk

2012-Jan-10 21:55 UTC

head link

Re: Load increase after memory upgrade (part2)

On Mon, Dec 19, 2011 at 10:56:09AM -0400, Konrad Rzeszutek Wilk
wrote:> On Sun, Dec 18, 2011 at 01:19:16AM +0100, Sander Eikelenboom wrote:
> > I also have done some experiments with the patch, in domU i also get
the 0% full for my usb controllers with video grabbers , in dom0 my i get 12%
full, both my realtek 8169 ethernet controllers seem to use the bounce buffering
...
> > And that with a iommu (amd) ? it all seems kind of strange, although
it is also working ...
> > I''m not having much time now, hoping to get back with a full
report soon.
> 
> Hm, so domU nothing, but dom0 it reports. Maybe the patch is incorrect
> when running as PV guest .. Will look in more details after the
> holidays. Thanks for being willing to try it out.
Good news is I am able to reproduce this with my 32-bit NIC with 3.2 domU:

[  771.896140] SWIOTLB is 11% full
[  776.896116] 0 [e1000 0000:00:00.0] bounce: from:222028(slow:0)to:2 map:222037
unmap:227220 sync:0
[  776.896126] 1 [e1000 0000:00:00.0] bounce: from:0(slow:0)to:5188 map:5188
unmap:0 sync:0
[  776.896133] 3 [e1000 0000:00:00.0] bounce: from:0(slow:0)to:1 map:1 unmap:0
sync:0

but interestingly enough, if I boot the guest as the first one I do not get
these bounce
requests. I will shortly bootup a Xen-O-Linux kernel and see if I get these same
numbers.

Sander Eikelenboom

2012-Jan-12 22:06 UTC

head link

Re: Load increase after memory upgrade (part2)

Hello Konrad,

Tuesday, January 10, 2012, 10:55:33 PM, you wrote:
> On Mon, Dec 19, 2011 at 10:56:09AM -0400, Konrad Rzeszutek Wilk wrote:
>> On Sun, Dec 18, 2011 at 01:19:16AM +0100, Sander Eikelenboom wrote:
>> > I also have done some experiments with the patch, in domU i also
get the 0% full for my usb controllers with video grabbers , in dom0 my i get
12% full, both my realtek 8169 ethernet controllers seem to use the bounce
buffering ...
>> > And that with a iommu (amd) ? it all seems kind of strange,
although it is also working ...
>> > I''m not having much time now, hoping to get back with a
full report soon.
>> 
>> Hm, so domU nothing, but dom0 it reports. Maybe the patch is incorrect
>> when running as PV guest .. Will look in more details after the
>> holidays. Thanks for being willing to try it out.
> Good news is I am able to reproduce this with my 32-bit NIC with 3.2 domU:
> [  771.896140] SWIOTLB is 11% full
> [  776.896116] 0 [e1000 0000:00:00.0] bounce: from:222028(slow:0)to:2
map:222037 unmap:227220 sync:0
> [  776.896126] 1 [e1000 0000:00:00.0] bounce: from:0(slow:0)to:5188
map:5188 unmap:0 sync:0
> [  776.896133] 3 [e1000 0000:00:00.0] bounce: from:0(slow:0)to:1 map:1
unmap:0 sync:0
> but interestingly enough, if I boot the guest as the first one I do not get
these bounce
> requests. I will shortly bootup a Xen-O-Linux kernel and see if I get these
same
> numbers.

I started to expiriment some more with what i encountered.

On dom0 i was seeing that my r8169 ethernet controllers where using bounce
buffering with the dump-swiotlb module.
It was showing "12% full".
Checking in sysfs shows:
serveerstertje:/sys/bus/pci/devices/0000:09:00.0# cat consistent_dma_mask_bits
32
serveerstertje:/sys/bus/pci/devices/0000:09:00.0# cat dma_mask_bits
32

If i remember correctly wasn''t the allocation for dom0 changed to be to
the top of memory instead of low .. somewhere between 2.6.32 and 3.0 ?
Could that change cause the need for all devices to need bounce buffering  and
could it therefore explain some people seeing more cpu usage for dom0 ?

I have forced my r8169 to use 64bits dma mask (using use_dac=1)
serveerstertje:/sys/bus/pci/devices/0000:09:00.0# cat consistent_dma_mask_bits
32
serveerstertje:/sys/bus/pci/devices/0000:09:00.0# cat dma_mask_bits
64

This results in dump-swiotlb reporting:

[ 1265.616106] 0 [r8169 0000:09:00.0] bounce: from:5(slow:0)to:0 map:0 unmap:0
sync:10
[ 1265.625043] SWIOTLB is 0% full
[ 1270.626085] 0 [r8169 0000:08:00.0] bounce: from:6(slow:0)to:0 map:0 unmap:0
sync:12
[ 1270.635024] SWIOTLB is 0% full
[ 1275.635091] 0 [r8169 0000:09:00.0] bounce: from:5(slow:0)to:0 map:0 unmap:0
sync:10
[ 1275.644261] SWIOTLB is 0% full
[ 1280.654097] 0 [r8169 0000:09:00.0] bounce: from:5(slow:0)to:0 map:0 unmap:0
sync:10

So it has changed from 12% to 0%, although it still reports something about
bouncing ? or am i mis interpreting stuff ?

Another thing i was wondering about, couldn''t the hypervisor offer a
small window in 32bit addressable mem to all (or only when pci passthrough is
used) domU''s to be used for DMA ?

(oh yes, i haven''t got i clue what i''m talking about ... so it
probably make no sense at all :-) )

--
Sander

Jan Beulich

2012-Jan-13 08:12 UTC

head link

Re: Load increase after memory upgrade (part2)

>>> On 12.01.12 at 23:06, Sander Eikelenboom
<linux@eikelenboom.it> wrote:
> Another thing i was wondering about, couldn''t the hypervisor offer
a small
> window in 32bit addressable mem to all (or only when pci passthrough is
used)
> domU''s to be used for DMA ?
How would use of such a range be arbitrated/protected? You''d have to
ask for reservation (aka allocation) of a chunk anyway, which is as good
as using the existing interfaces to obtain address restricted memory
(and the hypervisor has a [rudimentary] mechanism to preserve some
low memory for DMA allocations).

Jan

Konrad Rzeszutek Wilk

2012-Jan-13 15:13 UTC

head link

Re: Load increase after memory upgrade (part2)

> >> > I also have done some experiments with the patch, in domU i
also get the 0% full for my usb controllers with video grabbers , in dom0 my i
get 12% full, both my realtek 8169 ethernet controllers seem to use the bounce
buffering ...
> >> > And that with a iommu (amd) ? it all seems kind of strange,
although it is also working ...
> >> > I''m not having much time now, hoping to get back
with a full report soon.
> >> 
> >> Hm, so domU nothing, but dom0 it reports. Maybe the patch is
incorrect
> >> when running as PV guest .. Will look in more details after the
> >> holidays. Thanks for being willing to try it out.
> 
> > Good news is I am able to reproduce this with my 32-bit NIC with 3.2
domU:
> 
> > [  771.896140] SWIOTLB is 11% full
> > [  776.896116] 0 [e1000 0000:00:00.0] bounce: from:222028(slow:0)to:2
map:222037 unmap:227220 sync:0
> > [  776.896126] 1 [e1000 0000:00:00.0] bounce: from:0(slow:0)to:5188
map:5188 unmap:0 sync:0
> > [  776.896133] 3 [e1000 0000:00:00.0] bounce: from:0(slow:0)to:1 map:1
unmap:0 sync:0
> 
> > but interestingly enough, if I boot the guest as the first one I do
not get these bounce
> > requests. I will shortly bootup a Xen-O-Linux kernel and see if I get
these same
> > numbers.
> 
> 
> I started to expiriment some more with what i encountered.
> 
> On dom0 i was seeing that my r8169 ethernet controllers where using bounce
buffering with the dump-swiotlb module.
> It was showing "12% full".
> Checking in sysfs shows:
> serveerstertje:/sys/bus/pci/devices/0000:09:00.0# cat
consistent_dma_mask_bits
> 32
> serveerstertje:/sys/bus/pci/devices/0000:09:00.0# cat dma_mask_bits
> 32
> 
> If i remember correctly wasn''t the allocation for dom0 changed to
be to the top of memory instead of low .. somewhere between 2.6.32 and 3.0 ?
? We never actually had dom0 support in the upstream kernel until 2.6.37.. The
2.6.32<->2.6.36 you are
referring to must have been the trees that I spun up - but the implementation of
SWIOTLB in them
had not really changed.
> Could that change cause the need for all devices to need bounce buffering 
and could it therefore explain some people seeing more cpu usage for dom0 ?
The issue I am seeing is not CPU usage in dom0, but rather the CPU usage in domU
with guests.
And that the older domU''s (XenOLinux) do not have this.

That I can''t understand - the implementation in both cases _looks_ to
do the same thing.
There was one issue I found in the upstream one, but even with that fix I still
get that "bounce" usage in domU.

Interestingly enough, I get that only if I have launched, destroyed, launched,
etc, the guest multiple
times before I get this. Which leads me to believe this is not a kernel issue
but that we
are simply fragmented the Xen memory so much, so that when it launches the guest
all of the
memory is above 4GB. But that seems counter-intuive as by default Xen starts
guests at the far end of
memory (so on my 16GB box it would stick a 4GB guest at 12GB->16GB roughly).
The SWIOTLB
swizzles some memory under the 4GB , and this is where we get the bounce buffer
effect
(as the memory from 4GB is then copied to the memory 12GB->16GB).

But it does not explain why on the first couple of starts I did not see this
with pvops.
And it does not seem to happen with the XenOLinux kernel, so there must be
something else
in here.
> 
> I have forced my r8169 to use 64bits dma mask (using use_dac=1)
Ah yes.> serveerstertje:/sys/bus/pci/devices/0000:09:00.0# cat
consistent_dma_mask_bits
> 32
> serveerstertje:/sys/bus/pci/devices/0000:09:00.0# cat dma_mask_bits
> 64
> 
> This results in dump-swiotlb reporting:
> 
> [ 1265.616106] 0 [r8169 0000:09:00.0] bounce: from:5(slow:0)to:0 map:0
unmap:0 sync:10
> [ 1265.625043] SWIOTLB is 0% full
> [ 1270.626085] 0 [r8169 0000:08:00.0] bounce: from:6(slow:0)to:0 map:0
unmap:0 sync:12
> [ 1270.635024] SWIOTLB is 0% full
> [ 1275.635091] 0 [r8169 0000:09:00.0] bounce: from:5(slow:0)to:0 map:0
unmap:0 sync:10
> [ 1275.644261] SWIOTLB is 0% full
> [ 1280.654097] 0 [r8169 0000:09:00.0] bounce: from:5(slow:0)to:0 map:0
unmap:0 sync:10
Which is what we expect. No need to bounce since the PCI adapter can reach
memory
above the 4GB mark.
> 
> 
> 
> So it has changed from 12% to 0%, although it still reports something about
bouncing ? or am i mis interpreting stuff ?
The bouncing can happen due to two cases:
 - Memory is above 4GB
 - Memory crosses a page-boundary (rarely happens).> 
> 
> Another thing i was wondering about, couldn''t the hypervisor offer
a small window in 32bit addressable mem to all (or only when pci passthrough is
used) domU''s to be used for DMA ?
It does. That is what the Xen SWIOTLB does with "swizzling" the pages
in its pool.
But it can''t do it for every part of memory. That is why there are DMA
pools
which are used by graphics adapters, video capture devices,storage and network
drivers. They are used for small packet sizes so that the driver does not have
to allocate DMA buffers when it gets a 100bytes ping response. But for large
packets (say that ISO file you are downloading) it allocates memory on the fly
and "maps" it into the PCI space using the DMA API. That
"mapping" sets up
an "physical memory" -> "guest memory" translation - and
if that allocated
memory is above 4GB, part of this mapping is to copy ("bounce") the
memory
under the 4GB (where XenSWIOTLB has allocated a pool), so that the adapter
can physically fetch/put the data. Once that is completed it is
"sync"-ed
back, which is bouncing that data to the "allocated memory".

So having a DMA pool is very good - and most drivers use it. The thing I
can''t
figure out is:
 - why the DVB do not seem to use it, even thought they look to use the
videobuf_dma
   driver.
 - why the XenOLinux does not seem to have this problem (and this might be false
-
   perhaps it does have this problem and it just takes a couple of guest
launches,
   destructions, starts, etc to actually see it).
 - are there any flags in the domain builder to say: "ok, this domain is
going to
   service 32-bit cards, hence build the memory from 0->4GB". This seems
like
   a good know at first, but it probably is a bad idea (imagine using it by
mistake
   on every guest). And also nowadays most cards are PCIe and they can do
64-bit, so
   it would not be that important in the future.> 
> (oh yes, i haven''t got i clue what i''m talking about ...
so it probably make no sense at all :-) )
Nonsense. You were on the correct path . Hopefully the level of details
hasn''t
scared you off now :-)
> 
> 
> --
> Sander
> 
>

Sander Eikelenboom

2012-Jan-15 11:32 UTC

head link

Re: Load increase after memory upgrade (part2)

Friday, January 13, 2012, 4:13:07 PM, you wrote:
>> >> > I also have done some experiments with the patch, in domU
i also get the 0% full for my usb controllers with video grabbers , in dom0 my i
get 12% full, both my realtek 8169 ethernet controllers seem to use the bounce
buffering ...
>> >> > And that with a iommu (amd) ? it all seems kind of
strange, although it is also working ...
>> >> > I''m not having much time now, hoping to get back
with a full report soon.
>> >> 
>> >> Hm, so domU nothing, but dom0 it reports. Maybe the patch is
incorrect
>> >> when running as PV guest .. Will look in more details after
the
>> >> holidays. Thanks for being willing to try it out.
>> 
>> > Good news is I am able to reproduce this with my 32-bit NIC with
3.2 domU:
>> 
>> > [  771.896140] SWIOTLB is 11% full
>> > [  776.896116] 0 [e1000 0000:00:00.0] bounce:
from:222028(slow:0)to:2 map:222037 unmap:227220 sync:0
>> > [  776.896126] 1 [e1000 0000:00:00.0] bounce:
from:0(slow:0)to:5188 map:5188 unmap:0 sync:0
>> > [  776.896133] 3 [e1000 0000:00:00.0] bounce: from:0(slow:0)to:1
map:1 unmap:0 sync:0
>> 
>> > but interestingly enough, if I boot the guest as the first one I
do not get these bounce
>> > requests. I will shortly bootup a Xen-O-Linux kernel and see if I
get these same
>> > numbers.
>> 
>> 
>> I started to expiriment some more with what i encountered.
>> 
>> On dom0 i was seeing that my r8169 ethernet controllers where using
bounce buffering with the dump-swiotlb module.
>> It was showing "12% full".
>> Checking in sysfs shows:
>> serveerstertje:/sys/bus/pci/devices/0000:09:00.0# cat
consistent_dma_mask_bits
>> 32
>> serveerstertje:/sys/bus/pci/devices/0000:09:00.0# cat dma_mask_bits
>> 32
>> 
>> If i remember correctly wasn''t the allocation for dom0 changed
to be to the top of memory instead of low .. somewhere between 2.6.32 and 3.0 ?
> ? We never actually had dom0 support in the upstream kernel until 2.6.37..
The 2.6.32<->2.6.36 you are
> referring to must have been the trees that I spun up - but the
implementation of SWIOTLB in them
> had not really changed.
>> Could that change cause the need for all devices to need bounce
buffering  and could it therefore explain some people seeing more cpu usage for
dom0 ?
> The issue I am seeing is not CPU usage in dom0, but rather the CPU usage in
domU with guests.
> And that the older domU''s (XenOLinux) do not have this.
> That I can''t understand - the implementation in both cases _looks_
to do the same thing.
> There was one issue I found in the upstream one, but even with that fix I
still
> get that "bounce" usage in domU.
> Interestingly enough, I get that only if I have launched, destroyed,
launched, etc, the guest multiple
> times before I get this. Which leads me to believe this is not a kernel
issue but that we
> are simply fragmented the Xen memory so much, so that when it launches the
guest all of the
> memory is above 4GB. But that seems counter-intuive as by default Xen
starts guests at the far end of
> memory (so on my 16GB box it would stick a 4GB guest at 12GB->16GB
roughly). The SWIOTLB
> swizzles some memory under the 4GB , and this is where we get the bounce
buffer effect
> (as the memory from 4GB is then copied to the memory 12GB->16GB).
> But it does not explain why on the first couple of starts I did not see
this with pvops.
> And it does not seem to happen with the XenOLinux kernel, so there must be
something else
> in here.
>> 
>> I have forced my r8169 to use 64bits dma mask (using use_dac=1)
> Ah yes.
>> serveerstertje:/sys/bus/pci/devices/0000:09:00.0# cat
consistent_dma_mask_bits
>> 32
>> serveerstertje:/sys/bus/pci/devices/0000:09:00.0# cat dma_mask_bits
>> 64
>> 
>> This results in dump-swiotlb reporting:
>> 
>> [ 1265.616106] 0 [r8169 0000:09:00.0] bounce: from:5(slow:0)to:0 map:0
unmap:0 sync:10
>> [ 1265.625043] SWIOTLB is 0% full
>> [ 1270.626085] 0 [r8169 0000:08:00.0] bounce: from:6(slow:0)to:0 map:0
unmap:0 sync:12
>> [ 1270.635024] SWIOTLB is 0% full
>> [ 1275.635091] 0 [r8169 0000:09:00.0] bounce: from:5(slow:0)to:0 map:0
unmap:0 sync:10
>> [ 1275.644261] SWIOTLB is 0% full
>> [ 1280.654097] 0 [r8169 0000:09:00.0] bounce: from:5(slow:0)to:0 map:0
unmap:0 sync:10
> Which is what we expect. No need to bounce since the PCI adapter can reach
memory
> above the 4GB mark.
>> 
>> 
>> 
>> So it has changed from 12% to 0%, although it still reports something
about bouncing ? or am i mis interpreting stuff ?
> The bouncing can happen due to two cases:
>  - Memory is above 4GB
>  - Memory crosses a page-boundary (rarely happens).
>> 
>> 
>> Another thing i was wondering about, couldn''t the hypervisor
offer a small window in 32bit addressable mem to all (or only when pci
passthrough is used) domU''s to be used for DMA ?
> It does. That is what the Xen SWIOTLB does with "swizzling" the
pages in its pool.
> But it can''t do it for every part of memory. That is why there are
DMA pools
> which are used by graphics adapters, video capture devices,storage and
network
> drivers. They are used for small packet sizes so that the driver does not
have
> to allocate DMA buffers when it gets a 100bytes ping response. But for
large
> packets (say that ISO file you are downloading) it allocates memory on the
fly
> and "maps" it into the PCI space using the DMA API. That
"mapping" sets up
> an "physical memory" -> "guest memory" translation -
and if that allocated
> memory is above 4GB, part of this mapping is to copy ("bounce")
the memory
> under the 4GB (where XenSWIOTLB has allocated a pool), so that the adapter
> can physically fetch/put the data. Once that is completed it is
"sync"-ed
> back, which is bouncing that data to the "allocated memory".
> So having a DMA pool is very good - and most drivers use it. The thing I
can''t
> figure out is:
>  - why the DVB do not seem to use it, even thought they look to use the
videobuf_dma
>    driver.
>  - why the XenOLinux does not seem to have this problem (and this might be
false -
>    perhaps it does have this problem and it just takes a couple of guest
launches,
>    destructions, starts, etc to actually see it).
>  - are there any flags in the domain builder to say: "ok, this domain
is going to
>    service 32-bit cards, hence build the memory from 0->4GB". This
seems like
>    a good know at first, but it probably is a bad idea (imagine using it by
mistake
>    on every guest). And also nowadays most cards are PCIe and they can do
64-bit, so
>    it would not be that important in the future.
>> 
>> (oh yes, i haven''t got i clue what i''m talking about
... so it probably make no sense at all :-) )
> Nonsense. You were on the correct path . Hopefully the level of details
hasn''t
> scared you off now :-)
Well it only gives some more questions :-)
The thing is, pci passthrough and especially the DMA part of it, all work behind
the scenes without giving much output about the way it is actually working.

The thing i was wondering about is if my AMD IOMMU is actually doing something
for PV guests.
When booting with iommu=off machine has 8GB mem, dom0 limited to 1024M and just
starting one domU with iommu=soft, with pci-passthrough and the USB pci-cards
with USB videograbbers attached to it, i would expect to find some bounce
buffering going.

                (HV_START_LOW 18446603336221196288)
                (FEATURES
''!writable_page_tables|pae_pgdir_above_4gb'')
                (VIRT_BASE 18446744071562067968)
                (GUEST_VERSION 2.6)
                (PADDR_OFFSET 0)
                (GUEST_OS linux)
                (HYPERCALL_PAGE 18446744071578849280)
                (LOADER generic)
                (SUSPEND_CANCEL 1)
                (PAE_MODE yes)
                (ENTRY 18446744071594476032)
                (XEN_VERSION xen-3.0)

Still i only see:

[   47.449072] Starting SWIOTLB debug thread.
[   47.449090] swiotlb_start_thread: Go!
[   47.449262] xen_swiotlb_start_thread: Go!
[   52.449158] 0 [ehci_hcd 0000:0a:00.3] bounce: from:432(slow:0)to:1329
map:1756 unmap:1781 sync:0
[   52.449180] 1 [ehci_hcd 0000:0a:00.7] bounce: from:0(slow:0)to:16 map:23
unmap:0 sync:0
[   52.449187] 2 [ohci_hcd 0000:0a:00.4] bounce: from:0(slow:0)to:4 map:5
unmap:0 sync:0
[   52.449226] SWIOTLB is 0% full
[   57.449180] 0 ehci_hcd 0000:0a:00.3 alloc coherent: 35, free: 0
[   57.449219] 1 ohci_hcd 0000:0a:00.6 alloc coherent: 1, free: 0
[   57.449265] SWIOTLB is 0% full
[   62.449176] SWIOTLB is 0% full
[   67.449336] SWIOTLB is 0% full
[   72.449279] SWIOTLB is 0% full
[   77.449121] SWIOTLB is 0% full
[   82.449236] SWIOTLB is 0% full
[   87.449242] SWIOTLB is 0% full
[   92.449241] SWIOTLB is 0% full
[  172.449102] 0 [ehci_hcd 0000:0a:00.7] bounce: from:3839(slow:0)to:664
map:4486 unmap:4617 sync:0
[  172.449123] 1 [ehci_hcd 0000:0a:00.7] bounce: from:0(slow:0)to:82 map:111
unmap:0 sync:0
[  172.449130] 2 [ehci_hcd 0000:0a:00.7] bounce: from:0(slow:0)to:32 map:36
unmap:0 sync:0
[  172.449170] SWIOTLB is 0% full
[  177.449109] 0 [ehci_hcd 0000:0a:00.7] bounce: from:5348(slow:0)to:524
map:5834 unmap:5952 sync:0
[  177.449131] 1 [ehci_hcd 0000:0a:00.7] bounce: from:0(slow:0)to:76 map:112
unmap:0 sync:0
[  177.449138] 2 [ehci_hcd 0000:0a:00.7] bounce: from:0(slow:0)to:4 map:6
unmap:0 sync:0
[  177.449178] SWIOTLB is 0% full
[  182.449143] 0 [ehci_hcd 0000:0a:00.7] bounce: from:5349(slow:0)to:563
map:5899 unmap:5949 sync:0
[  182.449157] 1 [ehci_hcd 0000:0a:00.7] bounce: from:0(slow:0)to:27 map:35
unmap:0 sync:0
[  182.449164] 2 [ehci_hcd 0000:0a:00.7] bounce: from:0(slow:0)to:10 map:15
unmap:0 sync:0
[  182.449204] SWIOTLB is 0% full
[  187.449112] 0 [ehci_hcd 0000:0a:00.7] bounce: from:5375(slow:0)to:592
map:5941 unmap:6022 sync:0
[  187.449126] 1 [ehci_hcd 0000:0a:00.7] bounce: from:0(slow:0)to:46 map:69
unmap:0 sync:0
[  187.449133] 2 [ehci_hcd 0000:0a:00.7] bounce: from:0(slow:0)to:9 map:12
unmap:0 sync:0
[  187.449173] SWIOTLB is 0% full
[  192.449183] 0 [ehci_hcd 0000:0a:00.7] bounce: from:5360(slow:0)to:556
map:5890 unmap:5978 sync:0
[  192.449226] 1 [ehci_hcd 0000:0a:00.7] bounce: from:0(slow:0)to:52 map:74
unmap:0 sync:0
[  192.449234] 2 [ehci_hcd 0000:0a:00.7] bounce: from:0(slow:0)to:10 map:14
unmap:0 sync:0
[  192.449275] SWIOTLB is 0% full

And the devices do work ... so how does that work ...

Thx for your explanation so far !

--
Sander






>> 
>> 
>> --
>> Sander
>> 
>> 


-- 
Best regards,
 Sander                            mailto:linux@eikelenboom.it

Konrad Rzeszutek Wilk

2012-Jan-17 21:02 UTC

head link

Re: Load increase after memory upgrade (part2)

> The thing i was wondering about is if my AMD IOMMU is actually doing
something for PV guests.
> When booting with iommu=off machine has 8GB mem, dom0 limited to 1024M and
just starting one domU with iommu=soft, with pci-passthrough and the USB
pci-cards with USB videograbbers attached to it, i would expect to find some
bounce buffering going.
> 
>                 (HV_START_LOW 18446603336221196288)
>                 (FEATURES
''!writable_page_tables|pae_pgdir_above_4gb'')
>                 (VIRT_BASE 18446744071562067968)
>                 (GUEST_VERSION 2.6)
>                 (PADDR_OFFSET 0)
>                 (GUEST_OS linux)
>                 (HYPERCALL_PAGE 18446744071578849280)
>                 (LOADER generic)
>                 (SUSPEND_CANCEL 1)
>                 (PAE_MODE yes)
>                 (ENTRY 18446744071594476032)
>                 (XEN_VERSION xen-3.0)
> 
> Still i only see:
> 
> [   47.449072] Starting SWIOTLB debug thread.
> [   47.449090] swiotlb_start_thread: Go!
> [   47.449262] xen_swiotlb_start_thread: Go!
> [   52.449158] 0 [ehci_hcd 0000:0a:00.3] bounce: from:432(slow:0)to:1329
map:1756 unmap:1781 sync:0
There is bouncing there.
..> [  172.449102] 0 [ehci_hcd 0000:0a:00.7] bounce: from:3839(slow:0)to:664
map:4486 unmap:4617 sync:0
And there.. 3839 of them.> [  172.449123] 1 [ehci_hcd 0000:0a:00.7] bounce: from:0(slow:0)to:82
map:111 unmap:0 sync:0
> [  172.449130] 2 [ehci_hcd 0000:0a:00.7] bounce: from:0(slow:0)to:32 map:36
unmap:0 sync:0
> [  172.449170] SWIOTLB is 0% full
> [  177.449109] 0 [ehci_hcd 0000:0a:00.7] bounce: from:5348(slow:0)to:524
map:5834 unmap:5952 sync:0
And 5348 here!

So bounce-buffering is definitly happening with this guest.
.. snip..> 
> And the devices do work ... so how does that work ...
Most (all?) drivers are written to work with bounce-buffering.
That has never been a problem.

The issue as I understand is that the DVB drivers allocate their buffers
from 0->4GB most (all the time?) so they never have to do bounce-buffering.

While the pv-ops one ends up quite frequently doing the bounce-buffering, which
implies that the DVB drivers end up allocating their buffers above the 4GB.
This means we end up spending some CPU time (in the guest) copying the memory
from >4GB to 0-4GB region (And vice-versa).

And I am not clear why this is happening. Hence my thought
was to run an Xen-O-Linux kernel v2.6.3X and a PVOPS v2.6.3X (where X is the
same) with the same PCI device (and the test would entail rebooting the
box in between the launches) to confirm that the Xen-O-Linux is doing something
that the PVOPS is not.

So far, I''ve haven''t had much luck compiling a Xen-O-Linux
v2.6.38 kernel
so :-(
> 
> Thx for your explanation so far !
Sure thing.

Pasi Kärkkäinen

2012-Jan-18 11:28 UTC

head link

Re: Load increase after memory upgrade (part2)

On Tue, Jan 17, 2012 at 04:02:25PM -0500, Konrad Rzeszutek Wilk
wrote:> > 
> > And the devices do work ... so how does that work ...
> 
> Most (all?) drivers are written to work with bounce-buffering.
> That has never been a problem.
> 
> The issue as I understand is that the DVB drivers allocate their buffers
> from 0->4GB most (all the time?) so they never have to do
bounce-buffering.
> 
> While the pv-ops one ends up quite frequently doing the bounce-buffering,
which
> implies that the DVB drivers end up allocating their buffers above the 4GB.
> This means we end up spending some CPU time (in the guest) copying the
memory
> from >4GB to 0-4GB region (And vice-versa).
> 
> And I am not clear why this is happening. Hence my thought
> was to run an Xen-O-Linux kernel v2.6.3X and a PVOPS v2.6.3X (where X is
the
> same) with the same PCI device (and the test would entail rebooting the
> box in between the launches) to confirm that the Xen-O-Linux is doing
something
> that the PVOPS is not.
> 
> So far, I''ve haven''t had much luck compiling a
Xen-O-Linux v2.6.38 kernel
> so :-(
> 
Did you try downloading a binary rpm (or src.rpm) from OpenSuse? 
I think they have 2.6.38 xenlinux kernel available.

-- Pasi

Jan Beulich

2012-Jan-18 11:35 UTC

head link

Re: Load increase after memory upgrade (part2)

>>> On 17.01.12 at 22:02, Konrad Rzeszutek Wilk
<konrad.wilk@oracle.com> wrote:
> The issue as I understand is that the DVB drivers allocate their buffers
> from 0->4GB most (all the time?) so they never have to do
bounce-buffering.
> 
> While the pv-ops one ends up quite frequently doing the bounce-buffering, 
> which
> implies that the DVB drivers end up allocating their buffers above the 4GB.
> This means we end up spending some CPU time (in the guest) copying the 
> memory
> from >4GB to 0-4GB region (And vice-versa).
This reminds me of something (not sure what XenoLinux you use for
comparison) - how are they allocating that memory? Not vmalloc_32()
by chance (I remember having seen numerous uses under - iirc -
drivers/media/)?

Obviously, vmalloc_32() and any GFP_DMA32 allocations do *not* do
what their (driver) callers might expect in a PV guest (including the
contiguity assumption for the latter, recalling that you earlier said
you were able to see the problem after several guest starts), and I
had put into our kernels an adjustment to make vmalloc_32() actually
behave as expected.

Jan

Jan Beulich

2012-Jan-18 11:39 UTC

head link

Re: Load increase after memory upgrade (part2)

>>> On 18.01.12 at 12:28, Pasi Kärkkäinen<pasik@iki.fi> wrote:
> On Tue, Jan 17, 2012 at 04:02:25PM -0500, Konrad Rzeszutek Wilk wrote:
>> > 
>> > And the devices do work ... so how does that work ...
>> 
>> Most (all?) drivers are written to work with bounce-buffering.
>> That has never been a problem.
>> 
>> The issue as I understand is that the DVB drivers allocate their
buffers
>> from 0->4GB most (all the time?) so they never have to do
bounce-buffering.
>> 
>> While the pv-ops one ends up quite frequently doing the
bounce-buffering,
> which
>> implies that the DVB drivers end up allocating their buffers above the
4GB.
>> This means we end up spending some CPU time (in the guest) copying the 
> memory
>> from >4GB to 0-4GB region (And vice-versa).
>> 
>> And I am not clear why this is happening. Hence my thought
>> was to run an Xen-O-Linux kernel v2.6.3X and a PVOPS v2.6.3X (where X
is the
>> same) with the same PCI device (and the test would entail rebooting the
>> box in between the launches) to confirm that the Xen-O-Linux is doing 
> something
>> that the PVOPS is not.
>> 
>> So far, I've haven't had much luck compiling a Xen-O-Linux
v2.6.38 kernel
>> so :-(
>> 
> 
> Did you try downloading a binary rpm (or src.rpm) from OpenSuse? 
> I think they have 2.6.38 xenlinux kernel available.
openSUSE 11.4 is using 2.6.37; 12.1 is on 3.1 (and SLE is on 3.0).
Pulling out (consistent) patches at 2.6.38 level might be a little
involved.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Konrad Rzeszutek Wilk

2012-Jan-18 14:29 UTC

head link

Re: Load increase after memory upgrade (part2)

On Wed, Jan 18, 2012 at 11:35:35AM +0000, Jan Beulich
wrote:> >>> On 17.01.12 at 22:02, Konrad Rzeszutek Wilk
<konrad.wilk@oracle.com> wrote:
> > The issue as I understand is that the DVB drivers allocate their
buffers
> > from 0->4GB most (all the time?) so they never have to do
bounce-buffering.
> > 
> > While the pv-ops one ends up quite frequently doing the
bounce-buffering,
> > which
> > implies that the DVB drivers end up allocating their buffers above the
4GB.
> > This means we end up spending some CPU time (in the guest) copying the
> > memory
> > from >4GB to 0-4GB region (And vice-versa).
> 
> This reminds me of something (not sure what XenoLinux you use for
> comparison) - how are they allocating that memory? Not vmalloc_32()
I was using the 2.6.18, then the one I saw on Google for Gentoo, and now
I am going to look at the 2.6.38 from OpenSuSE.
> by chance (I remember having seen numerous uses under - iirc -
> drivers/media/)?
> 
> Obviously, vmalloc_32() and any GFP_DMA32 allocations do *not* do
> what their (driver) callers might expect in a PV guest (including the
> contiguity assumption for the latter, recalling that you earlier said
> you were able to see the problem after several guest starts), and I
> had put into our kernels an adjustment to make vmalloc_32() actually
> behave as expected.
Aaah.. The plot thickens! Let me look in the sources! Thanks for the
pointer.

Konrad Rzeszutek Wilk

2012-Jan-23 22:32 UTC

head link

Re: Load increase after memory upgrade (part2)

On Wed, Jan 18, 2012 at 10:29:23AM -0400, Konrad Rzeszutek Wilk
wrote:> On Wed, Jan 18, 2012 at 11:35:35AM +0000, Jan Beulich wrote:
> > >>> On 17.01.12 at 22:02, Konrad Rzeszutek Wilk
<konrad.wilk@oracle.com> wrote:
> > > The issue as I understand is that the DVB drivers allocate their
buffers
> > > from 0->4GB most (all the time?) so they never have to do
bounce-buffering.
> > > 
> > > While the pv-ops one ends up quite frequently doing the
bounce-buffering,
> > > which
> > > implies that the DVB drivers end up allocating their buffers
above the 4GB.
> > > This means we end up spending some CPU time (in the guest)
copying the
> > > memory
> > > from >4GB to 0-4GB region (And vice-versa).
> > 
> > This reminds me of something (not sure what XenoLinux you use for
> > comparison) - how are they allocating that memory? Not vmalloc_32()
> 
> I was using the 2.6.18, then the one I saw on Google for Gentoo, and now
> I am going to look at the 2.6.38 from OpenSuSE.
> 
> > by chance (I remember having seen numerous uses under - iirc -
> > drivers/media/)?
> > 
> > Obviously, vmalloc_32() and any GFP_DMA32 allocations do *not* do
> > what their (driver) callers might expect in a PV guest (including the
> > contiguity assumption for the latter, recalling that you earlier said
> > you were able to see the problem after several guest starts), and I
> > had put into our kernels an adjustment to make vmalloc_32() actually
> > behave as expected.
> 
> Aaah.. The plot thickens! Let me look in the sources! Thanks for the
> pointer.
Jan hints lead me to the videobuf-dma-sg.c which does indeed to vmalloc_32
and then performs PCI DMA operations on the allocted vmalloc_32
area.

So I cobbled up the attached patch (hadn''t actually tested it and sadly
won''t until next week) which removes the call to vmalloc_32 and instead
sets up DMA allocated set of pages.

If that fixes it for you that is awesome, but if it breaks please
send me your logs.

Cheers,
Konrad


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jan Beulich

2012-Jan-24 08:58 UTC

head link

Re: Load increase after memory upgrade (part2)

>>> On 23.01.12 at 23:32, Konrad Rzeszutek Wilk
<konrad.wilk@oracle.com> wrote:
> On Wed, Jan 18, 2012 at 10:29:23AM -0400, Konrad Rzeszutek Wilk wrote:
>> On Wed, Jan 18, 2012 at 11:35:35AM +0000, Jan Beulich wrote:
>> > >>> On 17.01.12 at 22:02, Konrad Rzeszutek Wilk
<konrad.wilk@oracle.com> wrote:
>> > > The issue as I understand is that the DVB drivers allocate
their buffers
>> > > from 0->4GB most (all the time?) so they never have to do
bounce-buffering.
>> > > 
>> > > While the pv-ops one ends up quite frequently doing the
bounce-buffering,
>> > > which
>> > > implies that the DVB drivers end up allocating their buffers
above the
> 4GB.
>> > > This means we end up spending some CPU time (in the guest)
copying the
>> > > memory
>> > > from >4GB to 0-4GB region (And vice-versa).
>> > 
>> > This reminds me of something (not sure what XenoLinux you use for
>> > comparison) - how are they allocating that memory? Not
vmalloc_32()
>> 
>> I was using the 2.6.18, then the one I saw on Google for Gentoo, and
now
>> I am going to look at the 2.6.38 from OpenSuSE.
>> 
>> > by chance (I remember having seen numerous uses under - iirc -
>> > drivers/media/)?
>> > 
>> > Obviously, vmalloc_32() and any GFP_DMA32 allocations do *not* do
>> > what their (driver) callers might expect in a PV guest (including
the
>> > contiguity assumption for the latter, recalling that you earlier
said
>> > you were able to see the problem after several guest starts), and
I
>> > had put into our kernels an adjustment to make vmalloc_32()
actually
>> > behave as expected.
>> 
>> Aaah.. The plot thickens! Let me look in the sources! Thanks for the
>> pointer.
> 
> Jan hints lead me to the videobuf-dma-sg.c which does indeed to vmalloc_32
> and then performs PCI DMA operations on the allocted vmalloc_32
> area.
> 
> So I cobbled up the attached patch (hadn''t actually tested it and
sadly
> won''t until next week) which removes the call to vmalloc_32 and
instead
> sets up DMA allocated set of pages.
What a big patch (which would need re-doing for every vmalloc_32()
caller)! Fixing vmalloc_32() would be much less intrusive (reproducing
our 3.2 version of the affected function below, but clearly that''s not
pv-ops ready).

Jan

static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
				 pgprot_t prot, int node, void *caller)
{
	const int order = 0;
	struct page **pages;
	unsigned int nr_pages, array_size, i;
	gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO;
#ifdef CONFIG_XEN
	gfp_t dma_mask = gfp_mask & (__GFP_DMA | __GFP_DMA32);

	BUILD_BUG_ON((__GFP_DMA | __GFP_DMA32) != (__GFP_DMA + __GFP_DMA32));
	if (dma_mask == (__GFP_DMA | __GFP_DMA32))
		gfp_mask &= ~(__GFP_DMA | __GFP_DMA32);
#endif

	nr_pages = (area->size - PAGE_SIZE) >> PAGE_SHIFT;
	array_size = (nr_pages * sizeof(struct page *));

	area->nr_pages = nr_pages;
	/* Please note that the recursion is strictly bounded. */
	if (array_size > PAGE_SIZE) {
		pages = __vmalloc_node(array_size, 1, nested_gfp|__GFP_HIGHMEM,
				PAGE_KERNEL, node, caller);
		area->flags |= VM_VPAGES;
	} else {
		pages = kmalloc_node(array_size, nested_gfp, node);
	}
	area->pages = pages;
	area->caller = caller;
	if (!area->pages) {
		remove_vm_area(area->addr);
		kfree(area);
		return NULL;
	}

	for (i = 0; i < area->nr_pages; i++) {
		struct page *page;
		gfp_t tmp_mask = gfp_mask | __GFP_NOWARN;

		if (node < 0)
			page = alloc_page(tmp_mask);
		else
			page = alloc_pages_node(node, tmp_mask, order);

		if (unlikely(!page)) {
			/* Successfully allocated i pages, free them in __vunmap() */
			area->nr_pages = i;
			goto fail;
		}
		area->pages[i] = page;
#ifdef CONFIG_XEN
		if (dma_mask) {
			if (xen_limit_pages_to_max_mfn(page, 0, 32)) {
				area->nr_pages = i + 1;
				goto fail;
			}
			if (gfp_mask & __GFP_ZERO)
				clear_highpage(page);
		}
#endif
	}

	if (map_vm_area(area, prot, &pages))
		goto fail;
	return area->addr;

fail:
	warn_alloc_failed(gfp_mask, order,
			  "vmalloc: allocation failure, allocated %ld of %ld bytes\n",
			  (area->nr_pages*PAGE_SIZE), area->size);
	vfree(area->addr);
	return NULL;
}

...

#if defined(CONFIG_64BIT) && defined(CONFIG_ZONE_DMA32)
#define GFP_VMALLOC32 GFP_DMA32 | GFP_KERNEL
#elif defined(CONFIG_64BIT) && defined(CONFIG_ZONE_DMA)
#define GFP_VMALLOC32 GFP_DMA | GFP_KERNEL
#elif defined(CONFIG_XEN)
#define GFP_VMALLOC32 __GFP_DMA | __GFP_DMA32 | GFP_KERNEL
#else
#define GFP_VMALLOC32 GFP_KERNEL
#endif

Konrad Rzeszutek Wilk

2012-Jan-24 14:17 UTC

head link

Re: Load increase after memory upgrade (part2)

On Tue, Jan 24, 2012 at 08:58:22AM +0000, Jan Beulich
wrote:> >>> On 23.01.12 at 23:32, Konrad Rzeszutek Wilk
<konrad.wilk@oracle.com> wrote:
> > On Wed, Jan 18, 2012 at 10:29:23AM -0400, Konrad Rzeszutek Wilk wrote:
> >> On Wed, Jan 18, 2012 at 11:35:35AM +0000, Jan Beulich wrote:
> >> > >>> On 17.01.12 at 22:02, Konrad Rzeszutek Wilk
<konrad.wilk@oracle.com> wrote:
> >> > > The issue as I understand is that the DVB drivers
allocate their buffers
> >> > > from 0->4GB most (all the time?) so they never have
to do bounce-buffering.
> >> > > 
> >> > > While the pv-ops one ends up quite frequently doing the
bounce-buffering,
> >> > > which
> >> > > implies that the DVB drivers end up allocating their
buffers above the
> > 4GB.
> >> > > This means we end up spending some CPU time (in the
guest) copying the
> >> > > memory
> >> > > from >4GB to 0-4GB region (And vice-versa).
> >> > 
> >> > This reminds me of something (not sure what XenoLinux you use
for
> >> > comparison) - how are they allocating that memory? Not
vmalloc_32()
> >> 
> >> I was using the 2.6.18, then the one I saw on Google for Gentoo,
and now
> >> I am going to look at the 2.6.38 from OpenSuSE.
> >> 
> >> > by chance (I remember having seen numerous uses under - iirc
-
> >> > drivers/media/)?
> >> > 
> >> > Obviously, vmalloc_32() and any GFP_DMA32 allocations do
*not* do
> >> > what their (driver) callers might expect in a PV guest
(including the
> >> > contiguity assumption for the latter, recalling that you
earlier said
> >> > you were able to see the problem after several guest starts),
and I
> >> > had put into our kernels an adjustment to make vmalloc_32()
actually
> >> > behave as expected.
> >> 
> >> Aaah.. The plot thickens! Let me look in the sources! Thanks for
the
> >> pointer.
> > 
> > Jan hints lead me to the videobuf-dma-sg.c which does indeed to
vmalloc_32
> > and then performs PCI DMA operations on the allocted vmalloc_32
> > area.
> > 
> > So I cobbled up the attached patch (hadn''t actually tested it
and sadly
> > won''t until next week) which removes the call to vmalloc_32
and instead
> > sets up DMA allocated set of pages.
> 
> What a big patch (which would need re-doing for every vmalloc_32()
> caller)! Fixing vmalloc_32() would be much less intrusive (reproducing
> our 3.2 version of the affected function below, but clearly that''s
not
> pv-ops ready).
I just want to get to the bottom of this before attempting a proper fix.
> 
> Jan
> 
> static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
> 				 pgprot_t prot, int node, void *caller)
> {
> 	const int order = 0;
> 	struct page **pages;
> 	unsigned int nr_pages, array_size, i;
> 	gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO;
> #ifdef CONFIG_XEN
> 	gfp_t dma_mask = gfp_mask & (__GFP_DMA | __GFP_DMA32);
> 
> 	BUILD_BUG_ON((__GFP_DMA | __GFP_DMA32) != (__GFP_DMA + __GFP_DMA32));
> 	if (dma_mask == (__GFP_DMA | __GFP_DMA32))
> 		gfp_mask &= ~(__GFP_DMA | __GFP_DMA32);
> #endif
> 
> 	nr_pages = (area->size - PAGE_SIZE) >> PAGE_SHIFT;
> 	array_size = (nr_pages * sizeof(struct page *));
> 
> 	area->nr_pages = nr_pages;
> 	/* Please note that the recursion is strictly bounded. */
> 	if (array_size > PAGE_SIZE) {
> 		pages = __vmalloc_node(array_size, 1, nested_gfp|__GFP_HIGHMEM,
> 				PAGE_KERNEL, node, caller);
> 		area->flags |= VM_VPAGES;
> 	} else {
> 		pages = kmalloc_node(array_size, nested_gfp, node);
> 	}
> 	area->pages = pages;
> 	area->caller = caller;
> 	if (!area->pages) {
> 		remove_vm_area(area->addr);
> 		kfree(area);
> 		return NULL;
> 	}
> 
> 	for (i = 0; i < area->nr_pages; i++) {
> 		struct page *page;
> 		gfp_t tmp_mask = gfp_mask | __GFP_NOWARN;
> 
> 		if (node < 0)
> 			page = alloc_page(tmp_mask);
> 		else
> 			page = alloc_pages_node(node, tmp_mask, order);
> 
> 		if (unlikely(!page)) {
> 			/* Successfully allocated i pages, free them in __vunmap() */
> 			area->nr_pages = i;
> 			goto fail;
> 		}
> 		area->pages[i] = page;
> #ifdef CONFIG_XEN
> 		if (dma_mask) {
> 			if (xen_limit_pages_to_max_mfn(page, 0, 32)) {
> 				area->nr_pages = i + 1;
> 				goto fail;
> 			}
> 			if (gfp_mask & __GFP_ZERO)
> 				clear_highpage(page);
> 		}
> #endif
> 	}
> 
> 	if (map_vm_area(area, prot, &pages))
> 		goto fail;
> 	return area->addr;
> 
> fail:
> 	warn_alloc_failed(gfp_mask, order,
> 			  "vmalloc: allocation failure, allocated %ld of %ld bytes\n",
> 			  (area->nr_pages*PAGE_SIZE), area->size);
> 	vfree(area->addr);
> 	return NULL;
> }
> 
> ...
> 
> #if defined(CONFIG_64BIT) && defined(CONFIG_ZONE_DMA32)
> #define GFP_VMALLOC32 GFP_DMA32 | GFP_KERNEL
> #elif defined(CONFIG_64BIT) && defined(CONFIG_ZONE_DMA)
> #define GFP_VMALLOC32 GFP_DMA | GFP_KERNEL
> #elif defined(CONFIG_XEN)
> #define GFP_VMALLOC32 __GFP_DMA | __GFP_DMA32 | GFP_KERNEL
> #else
> #define GFP_VMALLOC32 GFP_KERNEL
> #endif

Carsten Schiers

2012-Jan-24 21:32 UTC

head link

Re: Load increase after memory upgrade (part2)

Konrad, 

I implemented the patch into a 3.1.2 but the patched function doesn''t
seem to be called (I set debug=1 for the module).
I think it''s only for video capturing devices.

But I greped around and found a vmalloc_32 in
drivers/media/common/saa7146_core.c line 182 function
saa7146_vmalloc_build_pgtable
which is included in module saa7146.ko. This would be the DVB chip. Maybe you
can rework the patch so that we can just test what
you intended to test.

Consequently, the patch you did so far doesn''t change the load.

Carsten.




-----Ursprüngliche Nachricht-----
Von: xen-devel-bounces@lists.xensource.com
[mailto:xen-devel-bounces@lists.xensource.com] Im Auftrag von Konrad Rzeszutek
Wilk
Gesendet: Montag, 23. Januar 2012 23:32
An: Konrad Rzeszutek Wilk
Cc: Sander Eikelenboom; xen-devel; Jan Beulich
Betreff: Re: [Xen-devel] Load increase after memory upgrade (part2)

On Wed, Jan 18, 2012 at 10:29:23AM -0400, Konrad Rzeszutek Wilk
wrote:> On Wed, Jan 18, 2012 at 11:35:35AM +0000, Jan Beulich wrote:
> > >>> On 17.01.12 at 22:02, Konrad Rzeszutek Wilk
<konrad.wilk@oracle.com> wrote:
> > > The issue as I understand is that the DVB drivers allocate their 
> > > buffers from 0->4GB most (all the time?) so they never have to
do bounce-buffering.
> > > 
> > > While the pv-ops one ends up quite frequently doing the 
> > > bounce-buffering, which implies that the DVB drivers end up 
> > > allocating their buffers above the 4GB.
> > > This means we end up spending some CPU time (in the guest)
copying
> > > the memory from >4GB to 0-4GB region (And vice-versa).
> > 
> > This reminds me of something (not sure what XenoLinux you use for
> > comparison) - how are they allocating that memory? Not vmalloc_32()
> 
> I was using the 2.6.18, then the one I saw on Google for Gentoo, and 
> now I am going to look at the 2.6.38 from OpenSuSE.
> 
> > by chance (I remember having seen numerous uses under - iirc - 
> > drivers/media/)?
> > 
> > Obviously, vmalloc_32() and any GFP_DMA32 allocations do *not* do 
> > what their (driver) callers might expect in a PV guest (including 
> > the contiguity assumption for the latter, recalling that you earlier 
> > said you were able to see the problem after several guest starts), 
> > and I had put into our kernels an adjustment to make vmalloc_32() 
> > actually behave as expected.
> 
> Aaah.. The plot thickens! Let me look in the sources! Thanks for the 
> pointer.
Jan hints lead me to the videobuf-dma-sg.c which does indeed to vmalloc_32 and
then performs PCI DMA operations on the allocted vmalloc_32 area.

So I cobbled up the attached patch (hadn''t actually tested it and sadly
won''t until next week) which removes the call to vmalloc_32 and instead
sets up DMA allocated set of pages.

If that fixes it for you that is awesome, but if it breaks please send me your
logs.

Cheers,
Konrad
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Carsten Schiers

2012-Jan-25 12:02 UTC

head link

Re: Load increase after memory upgrade (part2)

I can now confirm that saa7146_vmalloc_build_pgtable and vmalloc_to_sg are
called once per

PCI card and will allocate 329 pages. Sorry, but I am not in the position to
modify your patch

to patch the functions in the right way, but happy to test...

 
BR, Carsten.
 
-----Ursprüngliche Nachricht-----
An:Konrad Rzeszutek Wilk <konrad@darnok.org>; 
CC:Sander Eikelenboom <linux@eikelenboom.it>; xen-devel
<xen-devel@lists.xensource.com>; Jan Beulich <JBeulich@suse.com>;
Von:Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Gesendet:Mo 23.01.2012 23:42
Betreff:Re: [Xen-devel] Load increase after memory upgrade (part2)
Anlage:vmalloc
On Wed, Jan 18, 2012 at 10:29:23AM -0400, Konrad Rzeszutek Wilk
wrote:> On Wed, Jan 18, 2012 at 11:35:35AM +0000, Jan Beulich wrote:
> > >>> On 17.01.12 at 22:02, Konrad Rzeszutek Wilk
<konrad.wilk@oracle.com> wrote:
> > > The issue as I understand is that the DVB drivers allocate their
buffers
> > > from 0->4GB most (all the time?) so they never have to do
bounce-buffering.
> > > 
> > > While the pv-ops one ends up quite frequently doing the
bounce-buffering,
> > > which
> > > implies that the DVB drivers end up allocating their buffers
above the 4GB.
> > > This means we end up spending some CPU time (in the guest)
copying the
> > > memory
> > > from >4GB to 0-4GB region (And vice-versa).
> > 
> > This reminds me of something (not sure what XenoLinux you use for
> > comparison) - how are they allocating that memory? Not vmalloc_32()
> 
> I was using the 2.6.18, then the one I saw on Google for Gentoo, and now
> I am going to look at the 2.6.38 from OpenSuSE.
> 
> > by chance (I remember having seen numerous uses under - iirc -
> > drivers/media/)?
> > 
> > Obviously, vmalloc_32() and any GFP_DMA32 allocations do *not* do
> > what their (driver) callers might expect in a PV guest (including the
> > contiguity assumption for the latter, recalling that you earlier said
> > you were able to see the problem after several guest starts), and I
> > had put into our kernels an adjustment to make vmalloc_32() actually
> > behave as expected.
> 
> Aaah.. The plot thickens! Let me look in the sources! Thanks for the
> pointer.
Jan hints lead me to the videobuf-dma-sg.c which does indeed to vmalloc_32
and then performs PCI DMA operations on the allocted vmalloc_32
area.

So I cobbled up the attached patch (hadn''t actually tested it and sadly
won''t until next week) which removes the call to vmalloc_32 and instead
sets up DMA allocated set of pages.

If that fixes it for you that is awesome, but if it breaks please
send me your logs.

Cheers,
Konrad
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Carsten Schiers

2012-Jan-25 19:06 UTC

head link

Re: Load increase after memory upgrade (part2)

Some news: in order to prepare a clean setting, I upgraded to 3.2.1 kernel. I
noticed that the load increase is
reduced a bit, but noticably. It''s only a simple test, running the DomU
for 2 minutes, but the idle load is aprox.

  - 2.6.32 pvops		12-13%
  - 3.2.1 pvops		10-11%
  - 2.6.34 XenoLinux	7-8%

BR, Carsten.


-----Ursprüngliche Nachricht-----
Von: xen-devel-bounces@lists.xensource.com
[mailto:xen-devel-bounces@lists.xensource.com] Im Auftrag von Konrad Rzeszutek
Wilk
Gesendet: Montag, 23. Januar 2012 23:32
An: Konrad Rzeszutek Wilk
Cc: Sander Eikelenboom; xen-devel; Jan Beulich
Betreff: Re: [Xen-devel] Load increase after memory upgrade (part2)

On Wed, Jan 18, 2012 at 10:29:23AM -0400, Konrad Rzeszutek Wilk
wrote:> On Wed, Jan 18, 2012 at 11:35:35AM +0000, Jan Beulich wrote:
> > >>> On 17.01.12 at 22:02, Konrad Rzeszutek Wilk
<konrad.wilk@oracle.com> wrote:
> > > The issue as I understand is that the DVB drivers allocate their 
> > > buffers from 0->4GB most (all the time?) so they never have to
do bounce-buffering.
> > > 
> > > While the pv-ops one ends up quite frequently doing the 
> > > bounce-buffering, which implies that the DVB drivers end up 
> > > allocating their buffers above the 4GB.
> > > This means we end up spending some CPU time (in the guest)
copying
> > > the memory from >4GB to 0-4GB region (And vice-versa).
> > 
> > This reminds me of something (not sure what XenoLinux you use for
> > comparison) - how are they allocating that memory? Not vmalloc_32()
> 
> I was using the 2.6.18, then the one I saw on Google for Gentoo, and 
> now I am going to look at the 2.6.38 from OpenSuSE.
> 
> > by chance (I remember having seen numerous uses under - iirc - 
> > drivers/media/)?
> > 
> > Obviously, vmalloc_32() and any GFP_DMA32 allocations do *not* do 
> > what their (driver) callers might expect in a PV guest (including 
> > the contiguity assumption for the latter, recalling that you earlier 
> > said you were able to see the problem after several guest starts), 
> > and I had put into our kernels an adjustment to make vmalloc_32() 
> > actually behave as expected.
> 
> Aaah.. The plot thickens! Let me look in the sources! Thanks for the 
> pointer.
Jan hints lead me to the videobuf-dma-sg.c which does indeed to vmalloc_32 and
then performs PCI DMA operations on the allocted vmalloc_32 area.

So I cobbled up the attached patch (hadn''t actually tested it and sadly
won''t until next week) which removes the call to vmalloc_32 and instead
sets up DMA allocated set of pages.

If that fixes it for you that is awesome, but if it breaks please send me your
logs.

Cheers,
Konrad
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Konrad Rzeszutek Wilk

2012-Jan-25 21:02 UTC

head link

Re: Load increase after memory upgrade (part2)

On Wed, Jan 25, 2012 at 08:06:12PM +0100, Carsten Schiers
wrote:> Some news: in order to prepare a clean setting, I upgraded to 3.2.1 kernel.
I noticed that the load increase is
> reduced a bit, but noticably. It''s only a simple test, running the
DomU for 2 minutes, but the idle load is aprox.
> 
>   - 2.6.32 pvops		12-13%
>   - 3.2.1 pvops		10-11%
Yeah. I think this idue to the fix I added in xen-swiotlb to not always
do the bounce copying.
>   - 2.6.34 XenoLinux	7-8%
>

Konrad Rzeszutek Wilk

2012-Feb-15 19:28 UTC

head link

Re: Load increase after memory upgrade (part2)

On Wed, Jan 25, 2012 at 08:06:12PM +0100, Carsten Schiers
wrote:> Some news: in order to prepare a clean setting, I upgraded to 3.2.1 kernel.
I noticed that the load increase is
> reduced a bit, but noticably. It''s only a simple test, running the
DomU for 2 minutes, but the idle load is aprox.
> 
>   - 2.6.32 pvops		12-13%
>   - 3.2.1 pvops		10-11%
>   - 2.6.34 XenoLinux	7-8%
I took a stab at Jan''s idea - it compiles but I hadn''t been
able to properly test it.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jan Beulich

2012-Feb-16 08:56 UTC

head link

Re: Load increase after memory upgrade (part2)

>>> On 15.02.12 at 20:28, Konrad Rzeszutek Wilk
<konrad.wilk@oracle.com> wrote:
>@@ -1550,7 +1552,11 @@ static void *__vmalloc_area_node(struct vm_struct
*area, gfp_t gfp_mask,
> 	struct page **pages;
> 	unsigned int nr_pages, array_size, i;
> 	gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO;
>-
>+	gfp_t dma_mask = gfp_mask & (__GFP_DMA | __GFP_DMA32);
>+	if (xen_pv_domain()) {
>+		if (dma_mask == (__GFP_DMA | __GFP_DMA32))
I didn''t spot where you force this normally invalid combination,
without
which the change won''t affect vmalloc32() in a 32-bit kernel.
>+			gfp_mask &= (__GFP_DMA | __GFP_DMA32);
			gfp_mask &= ~(__GFP_DMA | __GFP_DMA32);

Jan
>+	}
> 	nr_pages = (area->size - PAGE_SIZE) >> PAGE_SHIFT;
> 	array_size = (nr_pages * sizeof(struct page *));
>

Konrad Rzeszutek Wilk

2012-Feb-17 15:07 UTC

head link

Re: Load increase after memory upgrade (part2)

On Thu, Feb 16, 2012 at 08:56:53AM +0000, Jan Beulich
wrote:> >>> On 15.02.12 at 20:28, Konrad Rzeszutek Wilk
<konrad.wilk@oracle.com> wrote:
> >@@ -1550,7 +1552,11 @@ static void *__vmalloc_area_node(struct
vm_struct *area, gfp_t gfp_mask,
> > 	struct page **pages;
> > 	unsigned int nr_pages, array_size, i;
> > 	gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO;
> >-
> >+	gfp_t dma_mask = gfp_mask & (__GFP_DMA | __GFP_DMA32);
> >+	if (xen_pv_domain()) {
> >+		if (dma_mask == (__GFP_DMA | __GFP_DMA32))
> 
> I didn''t spot where you force this normally invalid combination,
without
> which the change won''t affect vmalloc32() in a 32-bit kernel.
> 
> >+			gfp_mask &= (__GFP_DMA | __GFP_DMA32);
> 
> 			gfp_mask &= ~(__GFP_DMA | __GFP_DMA32);
> 
> Jan
Duh!
Good eyes. Thanks for catching that.
> 
> >+	}
> > 	nr_pages = (area->size - PAGE_SIZE) >> PAGE_SHIFT;
> > 	array_size = (nr_pages * sizeof(struct page *));
> > 
>

Carsten Schiers

2012-Feb-28 14:35 UTC

head link

Re: Load increase after memory upgrade (part2)

Well let me check for a longer period of time, and especially, whether the DomU
is still

working (can do that only from at home), but load looks pretty well after
applying the

patch to 3.2.8 :-D.

 
BR,

Carsten.
 
-----Ursprüngliche Nachricht-----
An:Jan Beulich <JBeulich@suse.com>; 
CC:Konrad Rzeszutek Wilk <konrad@darnok.org>; xen-devel
<xen-devel@lists.xensource.com>; Carsten Schiers
<carsten@schiers.de>; Sander Eikelenboom <linux@eikelenboom.it>;
Von:Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Gesendet:Fr 17.02.2012 16:18
Betreff:Re: [Xen-devel] Load increase after memory upgrade (part2)
On Thu, Feb 16, 2012 at 08:56:53AM +0000, Jan Beulich
wrote:> >>> On 15.02.12 at 20:28, Konrad Rzeszutek Wilk
<konrad.wilk@oracle.com> wrote:
> >@@ -1550,7 +1552,11 @@ static void *__vmalloc_area_node(struct
vm_struct *area, gfp_t gfp_mask,
> > struct page **pages;
> > unsigned int nr_pages, array_size, i;
> > gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO;
> >-
> >+gfp_t dma_mask = gfp_mask & (__GFP_DMA | __GFP_DMA32);
> >+if (xen_pv_domain()) {
> >+if (dma_mask == (__GFP_DMA | __GFP_DMA32))
> 
> I didn''t spot where you force this normally invalid combination,
without
> which the change won''t affect vmalloc32() in a 32-bit kernel.
> 
> >+gfp_mask &= (__GFP_DMA | __GFP_DMA32);
> 
> gfp_mask &= ~(__GFP_DMA | __GFP_DMA32);
> 
> Jan
Duh!
Good eyes. Thanks for catching that.
> 
> >+}
> > nr_pages = (area->size - PAGE_SIZE) >> PAGE_SHIFT;
> > array_size = (nr_pages * sizeof(struct page *));
> > 
> 
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Carsten Schiers

2012-Feb-29 12:10 UTC

head link

Re: Load increase after memory upgrade (part2)

Great news: it works and load is back to normal. In the attached graph you can
see the peak

in blue (compilation of the patched 3.2.8 Kernel) and then after 16.00 the going
life of the

video DomU. We are below an avaerage of 7% usage (figures are in Permille).


Thanks so much. Is that already "the final patch"? 

 
BR, Carsten.

 

 
-----Ursprüngliche Nachricht-----
An:Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>; 
CC:Sander Eikelenboom <linux@eikelenboom.it>; xen-devel
<xen-devel@lists.xensource.com>; Jan Beulich <jbeulich@suse.com>;
Konrad Rzeszutek Wilk <konrad@darnok.org>;
Von:Carsten Schiers <carsten@schiers.de>
Gesendet:Di 28.02.2012 15:39
Betreff:Re: [Xen-devel] Load increase after memory upgrade (part2)
Anlage:inline.txt
 

Well let me check for a longer period of time, and especially, whether the DomU
is still

working (can do that only from at home), but load looks pretty well after
applying the

patch to 3.2.8 :-D.

 
BR,

Carsten.
 
-----Ursprüngliche Nachricht-----
An:Jan Beulich <JBeulich@suse.com>; 
CC:Konrad Rzeszutek Wilk <konrad@darnok.org>; xen-devel
<xen-devel@lists.xensource.com>; Carsten Schiers
<carsten@schiers.de>; Sander Eikelenboom <linux@eikelenboom.it>;
Von:Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Gesendet:Fr 17.02.2012 16:18
Betreff:Re: [Xen-devel] Load increase after memory upgrade (part2)
On Thu, Feb 16, 2012 at 08:56:53AM +0000, Jan Beulich
wrote:> >>> On 15.02.12 at 20:28, Konrad Rzeszutek Wilk
<konrad.wilk@oracle.com> wrote:
> >@@ -1550,7 +1552,11 @@ static void *__vmalloc_area_node(struct
vm_struct *area, gfp_t gfp_mask,
> > struct page **pages;
> > unsigned int nr_pages, array_size, i;
> > gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO;
> >-
> >+gfp_t dma_mask = gfp_mask & (__GFP_DMA | __GFP_DMA32);
> >+if (xen_pv_domain()) {
> >+if (dma_mask == (__GFP_DMA | __GFP_DMA32))
> 
> I didn''t spot where you force this normally invalid combination,
without
> which the change won''t affect vmalloc32() in a 32-bit kernel.
> 
> >+gfp_mask &= (__GFP_DMA | __GFP_DMA32);
> 
> gfp_mask &= ~(__GFP_DMA | __GFP_DMA32);
> 
> Jan
Duh!
Good eyes. Thanks for catching that.
> 
> >+}
> > nr_pages = (area->size - PAGE_SIZE) >> PAGE_SHIFT;
> > array_size = (nr_pages * sizeof(struct page *));
> > 
> 
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Carsten Schiers

2012-Feb-29 12:56 UTC

head link

Re: Load increase after memory upgrade (part2)

I am very sorry. I accidently started the DomU with the wrong config file, thus
it''s clear why there is no difference

between the two. And unfortunately, the DomU with the correct config file is
having a BUG:

 


[   14.674883] BUG: unable to handle kernel paging request at ffffc7fffffff000
[   14.674910] IP: [<ffffffff811b4c0b>] swiotlb_bounce+0x2e/0x31
[   14.674930] PGD 0 
[   14.674940] Oops: 0002 [#1] SMP 
[   14.674952] CPU 0 
[   14.674957] Modules linked in: nfsd exportfs nfs lockd fscache auth_rpcgss
nfs_acl sunrpc tda10023 budget_av evdev saa7146_vv videodev v4l2_compat_ioctl32
videobuf_dma_sg videobuf_core budget_core snd_pcm dvb_core snd_timer saa7146 snd
ttpci_eeprom soundcore snd_page_alloc i2c_core pcspkr ext3 jbd mbcache
xen_netfront xen_blkfront
[   14.675057] 
[   14.675065] Pid: 0, comm: swapper/0 Not tainted 3.2.8-amd64 #1  
[   14.675079] RIP: e030:[<ffffffff811b4c0b>]  [<ffffffff811b4c0b>]
swiotlb_bounce+0x2e/0x31
[   14.675097] RSP: e02b:ffff880013fabe58  EFLAGS: 00010202
[   14.675106] RAX: ffff880012800000 RBX: 0000000000000001 RCX: 0000000000001000
[   14.675116] RDX: 0000000000001000 RSI: ffff880012800000 RDI: ffffc7fffffff000
[   14.675126] RBP: 0000000000000002 R08: ffffc7fffffff000 R09: ffff880013f98000
[   14.675137] R10: 0000000000000001 R11: ffff880003376000 R12: ffff8800032c5090
[   14.675147] R13: 0000000000000149 R14: ffff8800033e0000 R15: ffffffff81601fd8
[   14.675163] FS:  00007f3ff9893700(0000) GS:ffff880013fa8000(0000)
knlGS:0000000000000000
[   14.675175] CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
[   14.675184] CR2: ffffc7fffffff000 CR3: 0000000012683000 CR4: 0000000000000660
[   14.675195] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   14.675205] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[   14.675216] Process swapper/0 (pid: 0, threadinfo ffffffff81600000, task
ffffffff8160d020)
[   14.675227] Stack:
[   14.675232]  ffffffff81211826 ffff880002eda000 0000000000000000
ffffc90000408000
[   14.675251]  00000000000b0150 0000000000000006 ffffffffa013ec4a
ffffffff810946cd
[   14.675270]  ffffffff81099203 ffff880003376000 0000000000000000
ffff880002eda4b0
[   14.675289] Call Trace:
[   14.675295]  <IRQ> 
[   14.675307]  [<ffffffff81211826>] ?
xen_swiotlb_sync_sg_for_cpu+0x2e/0x47
[   14.675322]  [<ffffffffa013ec4a>] ? vpeirq+0x7f/0x198 [budget_core]
[   14.675337]  [<ffffffff810946cd>] ? handle_irq_event_percpu+0x166/0x184
[   14.675350]  [<ffffffff81099203>] ? __rcu_process_callbacks+0x71/0x2f8
[   14.675364]  [<ffffffff8104d175>] ? tasklet_action+0x76/0xc5
[   14.675376]  [<ffffffff8120a9ac>] ? eoi_pirq+0x5b/0x77
[   14.675388]  [<ffffffff8104cbc6>] ? __do_softirq+0xc4/0x1a0
[   14.675400]  [<ffffffff8120a022>] ? __xen_evtchn_do_upcall+0x1c7/0x205
[   14.675412]  [<ffffffff8134b06c>] ? call_softirq+0x1c/0x30
[   14.675425]  [<ffffffff8100fa47>] ? do_softirq+0x3f/0x79
[   14.675436]  [<ffffffff8104c996>] ? irq_exit+0x44/0xb5
[   14.675452]  [<ffffffff8120b032>] ? xen_evtchn_do_upcall+0x27/0x32
[   14.675464]  [<ffffffff8134b0be>] ?
xen_do_hypervisor_callback+0x1e/0x30
[   14.675473]  <EOI> 

 
Complete log is attached.

 
BR, Carsten.
 
-----Ursprüngliche Nachricht-----
An:Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>; 
CC:Konrad Rzeszutek Wilk <konrad@darnok.org>; xen-devel
<xen-devel@lists.xensource.com>; Jan Beulich <jbeulich@suse.com>;
Sander Eikelenboom <linux@eikelenboom.it>;
Von:Carsten Schiers <carsten@schiers.de>
Gesendet:Mi 29.02.2012 13:16
Betreff:Re: [Xen-devel] Load increase after memory upgrade (part2)
Anlage:inline.txt
 

Great news: it works and load is back to normal. In the attached graph you can
see the peak

in blue (compilation of the patched 3.2.8 Kernel) and then after 16.00 the going
life of the

video DomU. We are below an avaerage of 7% usage (figures are in Permille).


Thanks so much. Is that already "the final patch"?

 
BR, Carsten.

 

 
-----Ursprüngliche Nachricht-----
An:Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>; 
CC:Sander Eikelenboom <linux@eikelenboom.it>; xen-devel
<xen-devel@lists.xensource.com>; Jan Beulich <jbeulich@suse.com>;
Konrad Rzeszutek Wilk <konrad@darnok.org>;
Von:Carsten Schiers <carsten@schiers.de>
Gesendet:Di 28.02.2012 15:39
Betreff:Re: [Xen-devel] Load increase after memory upgrade (part2)
Anlage:inline.txt
 

Well let me check for a longer period of time, and especially, whether the DomU
is still

working (can do that only from at home), but load looks pretty well after
applying the

patch to 3.2.8 :-D.

 
BR,

Carsten.
 
-----Ursprüngliche Nachricht-----
An:Jan Beulich <JBeulich@suse.com>; 
CC:Konrad Rzeszutek Wilk <konrad@darnok.org>; xen-devel
<xen-devel@lists.xensource.com>; Carsten Schiers
<carsten@schiers.de>; Sander Eikelenboom <linux@eikelenboom.it>;
Von:Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Gesendet:Fr 17.02.2012 16:18
Betreff:Re: [Xen-devel] Load increase after memory upgrade (part2)
On Thu, Feb 16, 2012 at 08:56:53AM +0000, Jan Beulich
wrote:> >>> On 15.02.12 at 20:28, Konrad Rzeszutek Wilk
<konrad.wilk@oracle.com> wrote:
> >@@ -1550,7 +1552,11 @@ static void *__vmalloc_area_node(struct
vm_struct *area, gfp_t gfp_mask,
> > struct page **pages;
> > unsigned int nr_pages, array_size, i;
> > gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO;
> >-
> >+gfp_t dma_mask = gfp_mask & (__GFP_DMA | __GFP_DMA32);
> >+if (xen_pv_domain()) {
> >+if (dma_mask == (__GFP_DMA | __GFP_DMA32))
> 
> I didn''t spot where you force this normally invalid combination,
without
> which the change won''t affect vmalloc32() in a 32-bit kernel.
> 
> >+gfp_mask &= (__GFP_DMA | __GFP_DMA32);
> 
> gfp_mask &= ~(__GFP_DMA | __GFP_DMA32);
> 
> Jan
Duh!
Good eyes. Thanks for catching that.
> 
> >+}
> > nr_pages = (area->size - PAGE_SIZE) >> PAGE_SHIFT;
> > array_size = (nr_pages * sizeof(struct page *));
> > 
> 
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
 
 

 



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Carsten Schiers

2012-May-11 09:39 UTC

head link

Re: Load increase after memory upgrade (part2)

Hi Konrad,

 
don''t want to be pushy, as I have no real issue. I simply use the
Xenified kernel or take the double load.

But I think this mistery is still open. My last status was that the latest patch
you produced resulted in a BUG,

so we still have not checked whether our theory is correct.

 
BR,

Carsten.
 
-----Ursprüngliche Nachricht-----
Von:Carsten Schiers <carsten@schiers.de>
Gesendet:Mi 29.02.2012 14:01
Betreff:Re: [Xen-devel] Load increase after memory upgrade (part2)
Anlage:debug.log, inline.txt
An:Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>; 
CC:Sander Eikelenboom <linux@eikelenboom.it>; xen-devel
<xen-devel@lists.xensource.com>; Jan Beulich <jbeulich@suse.com>;
Konrad Rzeszutek Wilk <konrad@darnok.org>;
 

I am very sorry. I accidently started the DomU with the wrong config file, thus
it''s clear why there is no difference

between the two. And unfortunately, the DomU with the correct config file is
having a BUG:

 


 [   14.674883] BUG: unable to handle kernel paging request at ffffc7fffffff000
[   14.674910] IP: [<ffffffff811b4c0b>] swiotlb_bounce+0x2e/0x31 [  
14.674930] PGD 0  [   14.674940] Oops: 0002 [#1] SMP  [   14.674952] CPU 0  [  
14.674957] Modules linked in: nfsd exportfs nfs lockd fscache auth_rpcgss
nfs_acl sunrpc tda10023 budget_av evdev saa7146_vv videodev v4l2_compat_ioctl32
videobuf_dma_sg videobuf_core budget_core snd_pcm dvb_core snd_timer saa7146 snd
ttpci_eeprom soundcore snd_page_alloc i2c_core pcspkr ext3 jbd mbcache
xen_netfront xen_blkfront [   14.675057]  [   14.675065] Pid: 0, comm: swapper/0
Not tainted 3.2.8-amd64 #1   [   14.675079] RIP: e030:[<ffffffff811b4c0b>]
[<ffffffff811b4c0b>] swiotlb_bounce+0x2e/0x31 [   14.675097] RSP:
e02b:ffff880013fabe58  EFLAGS: 00010202 [   14.675106] RAX: ffff880012800000
RBX: 0000000000000001 RCX: 0000000000001000 [   14.675116] RDX: 0000000000001000
RSI: ffff880012800000 RDI: ffffc7fffffff000 [   14.675126] RBP: 0000000000000002
R08: ffffc7fffffff000 R09: ffff880013f98000 [   14.675137] R10: 0000000000000001
R11: ffff880003376000 R12: ffff8800032c5090 [   14.675147] R13: 0000000000000149
R14: ffff8800033e0000 R15: ffffffff81601fd8 [   14.675163] FS: 
00007f3ff9893700(0000) GS:ffff880013fa8000(0000) knlGS:0000000000000000 [  
14.675175] CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b [   14.675184] CR2:
ffffc7fffffff000 CR3: 0000000012683000 CR4: 0000000000000660 [   14.675195] DR0:
0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [   14.675205] DR3:
0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [   14.675216]
Process swapper/0 (pid: 0, threadinfo ffffffff81600000, task ffffffff8160d020) [
14.675227] Stack: [   14.675232]  ffffffff81211826 ffff880002eda000
0000000000000000 ffffc90000408000 [   14.675251]  00000000000b0150
0000000000000006 ffffffffa013ec4a ffffffff810946cd [   14.675270] 
ffffffff81099203 ffff880003376000 0000000000000000 ffff880002eda4b0 [  
14.675289] Call Trace: [   14.675295]  <IRQ>  [   14.675307] 
[<ffffffff81211826>] ? xen_swiotlb_sync_sg_for_cpu+0x2e/0x47 [  
14.675322]  [<ffffffffa013ec4a>] ? vpeirq+0x7f/0x198 [budget_core] [  
14.675337]  [<ffffffff810946cd>] ? handle_irq_event_percpu+0x166/0x184 [  
14.675350]  [<ffffffff81099203>] ? __rcu_process_callbacks+0x71/0x2f8 [  
14.675364]  [<ffffffff8104d175>] ? tasklet_action+0x76/0xc5 [   14.675376]
[<ffffffff8120a9ac>] ? eoi_pirq+0x5b/0x77 [   14.675388] 
[<ffffffff8104cbc6>] ? __do_softirq+0xc4/0x1a0 [   14.675400] 
[<ffffffff8120a022>] ? __xen_evtchn_do_upcall+0x1c7/0x205 [   14.675412] 
[<ffffffff8134b06c>] ? call_softirq+0x1c/0x30 [   14.675425] 
[<ffffffff8100fa47>] ? do_softirq+0x3f/0x79 [   14.675436] 
[<ffffffff8104c996>] ? irq_exit+0x44/0xb5 [   14.675452] 
[<ffffffff8120b032>] ? xen_evtchn_do_upcall+0x27/0x32 [   14.675464] 
[<ffffffff8134b0be>] ? xen_do_hypervisor_callback+0x1e/0x30 [   14.675473]
<EOI>

 
Complete log is attached.

 
BR, Carsten.
 
-----Ursprüngliche Nachricht-----
An:Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>; 
CC:Konrad Rzeszutek Wilk <konrad@darnok.org>; xen-devel
<xen-devel@lists.xensource.com>; Jan Beulich <jbeulich@suse.com>;
Sander Eikelenboom <linux@eikelenboom.it>;
Von:Carsten Schiers <carsten@schiers.de>
Gesendet:Mi 29.02.2012 13:16
Betreff:Re: [Xen-devel] Load increase after memory upgrade (part2)
Anlage:inline.txt
 

Great news: it works and load is back to normal. In the attached graph you can
see the peak

in blue (compilation of the patched 3.2.8 Kernel) and then after 16.00 the going
life of the

video DomU. We are below an avaerage of 7% usage (figures are in Permille).


Thanks so much. Is that already "the final patch"?

 
BR, Carsten.

 

 
-----Ursprüngliche Nachricht-----
An:Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>; 
CC:Sander Eikelenboom <linux@eikelenboom.it>; xen-devel
<xen-devel@lists.xensource.com>; Jan Beulich <jbeulich@suse.com>;
Konrad Rzeszutek Wilk <konrad@darnok.org>;
Von:Carsten Schiers <carsten@schiers.de>
Gesendet:Di 28.02.2012 15:39
Betreff:Re: [Xen-devel] Load increase after memory upgrade (part2)
Anlage:inline.txt
 

Well let me check for a longer period of time, and especially, whether the DomU
is still

working (can do that only from at home), but load looks pretty well after
applying the

patch to 3.2.8 :-D.

 
BR,

Carsten.
 
-----Ursprüngliche Nachricht-----
An:Jan Beulich <JBeulich@suse.com>; 
CC:Konrad Rzeszutek Wilk <konrad@darnok.org>; xen-devel
<xen-devel@lists.xensource.com>; Carsten Schiers
<carsten@schiers.de>; Sander Eikelenboom <linux@eikelenboom.it>;
Von:Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Gesendet:Fr 17.02.2012 16:18
Betreff:Re: [Xen-devel] Load increase after memory upgrade (part2)
On Thu, Feb 16, 2012 at 08:56:53AM +0000, Jan Beulich
wrote:> >>> On 15.02.12 at 20:28, Konrad Rzeszutek Wilk
<konrad.wilk@oracle.com> wrote:
> >@@ -1550,7 +1552,11 @@ static void *__vmalloc_area_node(struct
vm_struct *area, gfp_t gfp_mask,
> > struct page **pages;
> > unsigned int nr_pages, array_size, i;
> > gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO;
> >-
> >+gfp_t dma_mask = gfp_mask & (__GFP_DMA | __GFP_DMA32);
> >+if (xen_pv_domain()) {
> >+if (dma_mask == (__GFP_DMA | __GFP_DMA32))
> 
> I didn''t spot where you force this normally invalid combination,
without
> which the change won''t affect vmalloc32() in a 32-bit kernel.
> 
> >+gfp_mask &= (__GFP_DMA | __GFP_DMA32);
> 
> gfp_mask &= ~(__GFP_DMA | __GFP_DMA32);
> 
> Jan
Duh!
Good eyes. Thanks for catching that.
> 
> >+}
> > nr_pages = (area->size - PAGE_SIZE) >> PAGE_SHIFT;
> > array_size = (nr_pages * sizeof(struct page *));
> > 
> 
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
 
 

 
 --------------------------------
E-Mail ist virenfrei.
 Von AVG überprüft - www.avg.de
 Version: 2012.0.2127 / Virendatenbank: 2411/4932 - Ausgabedatum: 12.04.2012
 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Konrad Rzeszutek Wilk

2012-May-11 19:41 UTC

head link

Re: Load increase after memory upgrade (part2)

On Fri, May 11, 2012 at 11:39:08AM +0200, Carsten Schiers
wrote:> Hi Konrad,
> 
>  
> don''t want to be pushy, as I have no real issue. I simply use the
Xenified kernel or take the double load.
> 
> But I think this mistery is still open. My last status was that the latest
patch you produced resulted in a BUG,
Yes, that is right. Thank you for reminding me.> 
> so we still have not checked whether our theory is correct.
No we haven''t. And I should be have no trouble reproducing this. I can
just write
a tiny module that allocates vmalloc_32().

But your timming sucks - I am going on a week vacation next week :-(

Ah, if there was just a cloning machine - I could stick myself in it,
and Baseline_0 goes on vacation, while Clone_1 goes on working. Then
git merge Baseline_0 and Clone_1 in a week and fixup the merge conflicts
and continue on. Sigh.

Can I ask you to be patient with me once more and ping me in a week - when
I am back from vacation and my brain is fresh to work on this?

Konrad Rzeszutek Wilk

2012-Jun-13 16:55 UTC

head link

Re: Load increase after memory upgrade (part2)

On Fri, May 11, 2012 at 03:41:38PM -0400, Konrad Rzeszutek Wilk
wrote:> On Fri, May 11, 2012 at 11:39:08AM +0200, Carsten Schiers wrote:
> > Hi Konrad,
> > 
> >  
> > don''t want to be pushy, as I have no real issue. I simply use
the Xenified kernel or take the double load.
> > 
> > But I think this mistery is still open. My last status was that the
latest patch you produced resulted in a BUG,
> 
> Yes, that is right. Thank you for reminding me.
> > 
> > so we still have not checked whether our theory is correct.
> 
> No we haven''t. And I should be have no trouble reproducing this. I
can just write
> a tiny module that allocates vmalloc_32().
Done. Found some bugs.. and here is anew version. Can you please
try it out? It has the #define DEBUG 1 set so it should print a lot of
stuff when the DVB module loads. If it crashes please send me the full log.

Thanks.
From 5afb4ab1fb3d2b059fe1a6db93ab65cb76f43b8a Mon Sep 17 00:00:00 2001
From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Date: Thu, 31 May 2012 14:21:04 -0400
Subject: [PATCH] xen/vmalloc_32: Use xen_exchange_.. when GFP flags are DMA.
 [v3]

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 arch/x86/xen/mmu.c    |  187 +++++++++++++++++++++++++++++++++++++++++++++++-
 include/xen/xen-ops.h |    2 +
 mm/vmalloc.c          |   18 +++++-
 3 files changed, 202 insertions(+), 5 deletions(-)

diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index 3a73785..960d206 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -47,6 +47,7 @@
 #include <linux/gfp.h>
 #include <linux/memblock.h>
 #include <linux/seq_file.h>
+#include <linux/slab.h>
 
 #include <trace/events/xen.h>
 
@@ -2051,6 +2052,7 @@ void __init xen_init_mmu_ops(void)
 /* Protected by xen_reservation_lock. */
 #define MAX_CONTIG_ORDER 9 /* 2MB */
 static unsigned long discontig_frames[1<<MAX_CONTIG_ORDER];
+static unsigned long limited_frames[1<<MAX_CONTIG_ORDER];
 
 #define VOID_PTE (mfn_pte(0, __pgprot(0)))
 static void xen_zap_pfn_range(unsigned long vaddr, unsigned int order,
@@ -2075,6 +2077,42 @@ static void xen_zap_pfn_range(unsigned long vaddr,
unsigned int order,
 	}
 	xen_mc_issue(0);
 }
+static int xen_zap_page_range(struct page *pages, unsigned int order,
+				unsigned long *in_frames,
+				unsigned long *out_frames,
+				void *limit_bitmap)
+{
+	int i, n = 0;
+	struct multicall_space mcs;
+	struct page *page;
+
+	xen_mc_batch();
+	for (i = 0; i < (1UL<<order); i++) {
+		if (!test_bit(i, limit_bitmap))
+			continue;
+
+		page = &pages[i];
+		mcs = __xen_mc_entry(0);
+#define DEBUG 1
+		if (in_frames) {
+#ifdef DEBUG
+			printk(KERN_INFO "%s:%d 0x%lx(pfn) 0x%lx (mfn) 0x%lx(vaddr)\n",
+				__func__, i, page_to_pfn(page),
+				pfn_to_mfn(page_to_pfn(page)), page_address(page));
+#endif
+			in_frames[i] = pfn_to_mfn(page_to_pfn(page));
+		}
+		MULTI_update_va_mapping(mcs.mc, (unsigned long)page_address(page), VOID_PTE,
0);
+		set_phys_to_machine(page_to_pfn(page), INVALID_P2M_ENTRY);
+
+		if (out_frames)
+			out_frames[i] = page_to_pfn(page);
+		++n;
+
+	}
+	xen_mc_issue(0);
+	return n;
+}
 
 /*
  * Update the pfn-to-mfn mappings for a virtual address range, either to
@@ -2118,6 +2156,53 @@ static void xen_remap_exchanged_ptes(unsigned long vaddr,
int order,
 
 	xen_mc_issue(0);
 }
+static void xen_remap_exchanged_pages(struct page *pages, int order,
+				     unsigned long *mfns,
+				     unsigned long first_mfn, /* in_frame if we failed*/
+				     void *limit_map)
+{
+	unsigned i, limit;
+	unsigned long mfn;
+	struct page *page;
+
+	xen_mc_batch();
+
+	limit = 1ULL << order;
+	for (i = 0; i < limit; i++) {
+		struct multicall_space mcs;
+		unsigned flags;
+
+		if (!test_bit(i, limit_map))
+			continue;
+
+		page = &pages[i];
+		mcs = __xen_mc_entry(0);
+		if (mfns)
+			mfn = mfns[i];
+		else
+			mfn = first_mfn + i;
+
+		if (i < (limit - 1))
+			flags = 0;
+		else {
+			if (order == 0)
+				flags = UVMF_INVLPG | UVMF_ALL;
+			else
+				flags = UVMF_TLB_FLUSH | UVMF_ALL;
+		}
+#ifdef DEBUG
+		printk(KERN_INFO "%s (%d) pfn:0x%lx, pfn: 0x%lx vaddr: 0x%lx\n",
+			__func__, i, page_to_pfn(page), mfn, page_address(page));
+#endif
+		MULTI_update_va_mapping(mcs.mc, (unsigned long)page_address(page),
+				mfn_pte(mfn, PAGE_KERNEL), flags);
+
+		set_phys_to_machine(page_to_pfn(page), mfn);
+	}
+
+	xen_mc_issue(0);
+}
+
 
 /*
  * Perform the hypercall to exchange a region of our pfns to point to
@@ -2136,7 +2221,9 @@ static int xen_exchange_memory(unsigned long extents_in,
unsigned int order_in,
 {
 	long rc;
 	int success;
-
+#ifdef DEBUG
+	int i;
+#endif
 	struct xen_memory_exchange exchange = {
 		.in = {
 			.nr_extents   = extents_in,
@@ -2157,7 +2244,11 @@ static int xen_exchange_memory(unsigned long extents_in,
unsigned int order_in,
 
 	rc = HYPERVISOR_memory_op(XENMEM_exchange, &exchange);
 	success = (exchange.nr_exchanged == extents_in);
-
+#ifdef DEBUG
+	for (i = 0; i <  exchange.nr_exchanged; i++) {
+		printk(KERN_INFO "%s 0x%lx (mfn) <-> 0x%lx (mfn)\n", 
__func__,pfns_in[i], mfns_out[i]);
+	}
+#endif
 	BUG_ON(!success && ((exchange.nr_exchanged != 0) || (rc == 0)));
 	BUG_ON(success && (rc != 0));
 
@@ -2231,8 +2322,8 @@ void xen_destroy_contiguous_region(unsigned long vstart,
unsigned int order)
 	xen_zap_pfn_range(vstart, order, NULL, out_frames);
 
 	/* 3. Do the exchange for non-contiguous MFNs. */
-	success = xen_exchange_memory(1, order, &in_frame, 1UL << order,
-					0, out_frames, 0);
+	success = xen_exchange_memory(1, order, &in_frame,
+				      1UL << order, 0, out_frames, 0);
 
 	/* 4. Map new pages in place of old pages. */
 	if (success)
@@ -2244,6 +2335,94 @@ void xen_destroy_contiguous_region(unsigned long vstart,
unsigned int order)
 }
 EXPORT_SYMBOL_GPL(xen_destroy_contiguous_region);
 
+int xen_limit_pages_to_max_mfn(struct page *pages, unsigned int order,
+			       unsigned int address_bits)
+{
+	unsigned long *in_frames = discontig_frames, *out_frames = limited_frames;
+	unsigned long  flags;
+	struct page *page;
+	int success;
+	int i, n = 0;
+	unsigned long _limit_map;
+	unsigned long *limit_map;
+
+	if (xen_feature(XENFEAT_auto_translated_physmap))
+		return 0;
+
+	if (unlikely(order > MAX_CONTIG_ORDER))
+		return -ENOMEM;
+
+	if (BITS_PER_LONG >> order) {
+		limit_map = kzalloc(BITS_TO_LONGS(1U << order) *
+				    sizeof(*limit_map), GFP_KERNEL);
+		if (unlikely(!limit_map))
+			return -ENOMEM;
+	} else
+		limit_map = &_limit_map;
+
+	/* 0. Construct our per page bitmap lookup. */
+
+	if (address_bits && (address_bits < PAGE_SHIFT))
+			return -EINVAL;
+
+	if (order)
+		bitmap_zero(limit_map, 1U << order);
+	else
+		__set_bit(0, limit_map);
+
+	/* 1. Clear the pages */
+	for (i = 0; i < (1ULL << order); i++) {
+		void *vaddr;
+		page = &pages[i];
+
+		vaddr = page_address(page);
+#ifdef DEBUG
+		printk(KERN_INFO "%s: page: %p vaddr: %p 0x%lx(mfn) 0x%lx(pfn)\n",
__func__, page, vaddr, virt_to_mfn(vaddr), mfn_to_pfn(virt_to_mfn(vaddr)));
+#endif
+		if (address_bits) {
+			if (!(virt_to_mfn(vaddr) >> (address_bits - PAGE_SHIFT)))
+				continue;
+			__set_bit(i, limit_map);
+		}
+		if (!PageHighMem(page))
+			memset(vaddr, 0, PAGE_SIZE);
+		else {
+			memset(kmap(page), 0, PAGE_SIZE);
+			kunmap(page);
+			++n;
+		}
+	}
+	/* Check to see if we actually have to do any work. */
+	if (bitmap_empty(limit_map, 1U << order)) {
+		if (limit_map != &_limit_map)
+			kfree(limit_map);
+		return 0;
+	}
+	if (n)
+		kmap_flush_unused();
+
+	spin_lock_irqsave(&xen_reservation_lock, flags);
+
+	/* 2. Zap current PTEs. */
+	n = xen_zap_page_range(pages, order, in_frames, NULL /*out_frames */,
limit_map);
+
+	/* 3. Do the exchange for non-contiguous MFNs. */
+	success = xen_exchange_memory(n, 0 /* this is always called per page */,
in_frames,
+				      n, 0, out_frames, address_bits);
+
+	/* 4. Map new pages in place of old pages. */
+	if (success)
+		xen_remap_exchanged_pages(pages, order, out_frames, 0, limit_map);
+	else
+		xen_remap_exchanged_pages(pages, order, NULL, *in_frames, limit_map);
+
+	spin_unlock_irqrestore(&xen_reservation_lock, flags);
+	if (limit_map != &_limit_map)
+		kfree(limit_map);
+
+	return success ? 0 : -ENOMEM;
+}
+EXPORT_SYMBOL_GPL(xen_limit_pages_to_max_mfn);
 #ifdef CONFIG_XEN_PVHVM
 static void xen_hvm_exit_mmap(struct mm_struct *mm)
 {
diff --git a/include/xen/xen-ops.h b/include/xen/xen-ops.h
index 6a198e4..2f8709f 100644
--- a/include/xen/xen-ops.h
+++ b/include/xen/xen-ops.h
@@ -29,4 +29,6 @@ int xen_remap_domain_mfn_range(struct vm_area_struct *vma,
 			       unsigned long mfn, int nr,
 			       pgprot_t prot, unsigned domid);
 
+int xen_limit_pages_to_max_mfn(struct page *pages, unsigned int order,
+			       unsigned int address_bits);
 #endif /* INCLUDE_XEN_OPS_H */
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 2aad499..194af07 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -31,6 +31,8 @@
 #include <asm/tlbflush.h>
 #include <asm/shmparam.h>
 
+#include <xen/xen.h>
+#include <xen/xen-ops.h>
 /*** Page table manipulation functions ***/
 
 static void vunmap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end)
@@ -1576,7 +1578,11 @@ static void *__vmalloc_area_node(struct vm_struct *area,
gfp_t gfp_mask,
 	struct page **pages;
 	unsigned int nr_pages, array_size, i;
 	gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO;
-
+	gfp_t dma_mask = gfp_mask & (__GFP_DMA | __GFP_DMA32);
+	if (xen_pv_domain()) {
+		if (dma_mask == (__GFP_DMA | __GFP_DMA32))
+			gfp_mask &= ~(__GFP_DMA | __GFP_DMA32);
+	}
 	nr_pages = (area->size - PAGE_SIZE) >> PAGE_SHIFT;
 	array_size = (nr_pages * sizeof(struct page *));
 
@@ -1612,6 +1618,16 @@ static void *__vmalloc_area_node(struct vm_struct *area,
gfp_t gfp_mask,
 			goto fail;
 		}
 		area->pages[i] = page;
+		if (xen_pv_domain()) {
+			if (dma_mask) {
+				if (xen_limit_pages_to_max_mfn(page, 0, 32)) {
+					area->nr_pages = i + 1;
+					goto fail;
+				}
+			if (gfp_mask & __GFP_ZERO)
+				clear_highpage(page);
+			}
+		}
 	}
 
 	if (map_vm_area(area, prot, &pages))
-- 
1.7.7.6

Jan Beulich

2012-Jun-14 07:07 UTC

head link

Re: Load increase after memory upgrade (part2)

>>> On 13.06.12 at 18:55, Konrad Rzeszutek Wilk
<konrad.wilk@oracle.com> wrote:
> @@ -1576,7 +1578,11 @@ static void *__vmalloc_area_node(struct vm_struct
*area, gfp_t gfp_mask,
>  	struct page **pages;
>  	unsigned int nr_pages, array_size, i;
>  	gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO;
> -
> +	gfp_t dma_mask = gfp_mask & (__GFP_DMA | __GFP_DMA32);
> +	if (xen_pv_domain()) {
> +		if (dma_mask == (__GFP_DMA | __GFP_DMA32))
As said in an earlier reply - without having any place that would
ever set both flags at once, this whole conditional is meaningless.
In our code - which I suppose is where you cloned this from - we
set GFP_VMALLOC32 to such a value for 32-bit kernels (which
otherwise would merely use GFP_KERNEL, and hence not trigger
the code calling xen_limit_pages_to_max_mfn()). I don''t recall
though whether Carsten''s problem was on a 32- or 64-bit kernel.

Jan
> +			gfp_mask &= ~(__GFP_DMA | __GFP_DMA32);
> +	}
>  	nr_pages = (area->size - PAGE_SIZE) >> PAGE_SHIFT;
>  	array_size = (nr_pages * sizeof(struct page *));
>

David Vrabel

2012-Jun-14 08:38 UTC

head link

Re: Load increase after memory upgrade (part2)

On 13/06/12 17:55, Konrad Rzeszutek Wilk wrote:> 
> +	/* 3. Do the exchange for non-contiguous MFNs. */
> +	success = xen_exchange_memory(n, 0 /* this is always called per page */,
in_frames,
> +				      n, 0, out_frames, address_bits);
vmalloc() does not require physically contiguous MFNs.

David

Konrad Rzeszutek Wilk

2012-Jun-14 18:31 UTC

head link

Re: Load increase after memory upgrade (part2)

On Thu, Jun 14, 2012 at 09:38:31AM +0100, David Vrabel
wrote:> On 13/06/12 17:55, Konrad Rzeszutek Wilk wrote:
> > 
> > +	/* 3. Do the exchange for non-contiguous MFNs. */
> > +	success = xen_exchange_memory(n, 0 /* this is always called per page
*/, in_frames,
> > +				      n, 0, out_frames, address_bits);
> 
> vmalloc() does not require physically contiguous MFNs.
<nods> It doesn''t matter that much in this context as the vmalloc
calls this per-page - so it is only one page that is swizzled.
> 
> David

Konrad Rzeszutek Wilk

2012-Jun-14 18:33 UTC

head link

Re: Load increase after memory upgrade (part2)

On Thu, Jun 14, 2012 at 08:07:55AM +0100, Jan Beulich
wrote:> >>> On 13.06.12 at 18:55, Konrad Rzeszutek Wilk
<konrad.wilk@oracle.com> wrote:
> > @@ -1576,7 +1578,11 @@ static void *__vmalloc_area_node(struct
vm_struct *area, gfp_t gfp_mask,
> >  	struct page **pages;
> >  	unsigned int nr_pages, array_size, i;
> >  	gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO;
> > -
> > +	gfp_t dma_mask = gfp_mask & (__GFP_DMA | __GFP_DMA32);
> > +	if (xen_pv_domain()) {
> > +		if (dma_mask == (__GFP_DMA | __GFP_DMA32))
> 
> As said in an earlier reply - without having any place that would
> ever set both flags at once, this whole conditional is meaningless.
> In our code - which I suppose is where you cloned this from - we
Yup.> set GFP_VMALLOC32 to such a value for 32-bit kernels (which
> otherwise would merely use GFP_KERNEL, and hence not trigger
Ah, let me double check. Thanks for looking out for this.
> the code calling xen_limit_pages_to_max_mfn()). I don''t recall
> though whether Carsten''s problem was on a 32- or 64-bit kernel.
> 
> Jan
> 
> > +			gfp_mask &= ~(__GFP_DMA | __GFP_DMA32);
> > +	}
> >  	nr_pages = (area->size - PAGE_SIZE) >> PAGE_SHIFT;
> >  	array_size = (nr_pages * sizeof(struct page *));
> >  
> 
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

Carsten Schiers

2012-Jun-14 18:40 UTC

head link

Re: Load increase after memory upgrade (part2)

Konrad, against which kernel version did you produce this patch? It will not
succeed
with 3.4.2 at least, will look up some older version now...

-----Ursprüngliche Nachricht-----
Von: xen-devel-bounces@lists.xen.org [mailto:xen-devel-bounces@lists.xen.org] Im
Auftrag von Konrad Rzeszutek Wilk
Gesendet: Mittwoch, 13. Juni 2012 18:55
An: Carsten Schiers
Cc: Konrad Rzeszutek Wilk; xen-devel; Jan Beulich; Sander Eikelenboom
Betreff: Re: [Xen-devel] Load increase after memory upgrade (part2)

On Fri, May 11, 2012 at 03:41:38PM -0400, Konrad Rzeszutek Wilk
wrote:> On Fri, May 11, 2012 at 11:39:08AM +0200, Carsten Schiers wrote:
> > Hi Konrad,
> > 
> >  
> > don''t want to be pushy, as I have no real issue. I simply use
the Xenified kernel or take the double load.
> > 
> > But I think this mistery is still open. My last status was that the 
> > latest patch you produced resulted in a BUG,
> 
> Yes, that is right. Thank you for reminding me.
> > 
> > so we still have not checked whether our theory is correct.
> 
> No we haven''t. And I should be have no trouble reproducing this. I
can
> just write a tiny module that allocates vmalloc_32().
Done. Found some bugs.. and here is anew version. Can you please try it out? It
has the #define DEBUG 1 set so it should print a lot of stuff when the DVB
module loads. If it crashes please send me the full log.

Thanks.
From 5afb4ab1fb3d2b059fe1a6db93ab65cb76f43b8a Mon Sep 17 00:00:00 2001
From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Date: Thu, 31 May 2012 14:21:04 -0400
Subject: [PATCH] xen/vmalloc_32: Use xen_exchange_.. when GFP flags are DMA.
 [v3]

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 arch/x86/xen/mmu.c    |  187 +++++++++++++++++++++++++++++++++++++++++++++++-
 include/xen/xen-ops.h |    2 +
 mm/vmalloc.c          |   18 +++++-
 3 files changed, 202 insertions(+), 5 deletions(-)

diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c index 3a73785..960d206
100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -47,6 +47,7 @@
 #include <linux/gfp.h>
 #include <linux/memblock.h>
 #include <linux/seq_file.h>
+#include <linux/slab.h>
 
 #include <trace/events/xen.h>
 
@@ -2051,6 +2052,7 @@ void __init xen_init_mmu_ops(void)
 /* Protected by xen_reservation_lock. */  #define MAX_CONTIG_ORDER 9 /* 2MB */ 
static unsigned long discontig_frames[1<<MAX_CONTIG_ORDER];
+static unsigned long limited_frames[1<<MAX_CONTIG_ORDER];
 
 #define VOID_PTE (mfn_pte(0, __pgprot(0)))  static void
xen_zap_pfn_range(unsigned long vaddr, unsigned int order, @@ -2075,6 +2077,42
@@ static void xen_zap_pfn_range(unsigned long vaddr, unsigned int order,
 	}
 	xen_mc_issue(0);
 }
+static int xen_zap_page_range(struct page *pages, unsigned int order,
+				unsigned long *in_frames,
+				unsigned long *out_frames,
+				void *limit_bitmap)
+{
+	int i, n = 0;
+	struct multicall_space mcs;
+	struct page *page;
+
+	xen_mc_batch();
+	for (i = 0; i < (1UL<<order); i++) {
+		if (!test_bit(i, limit_bitmap))
+			continue;
+
+		page = &pages[i];
+		mcs = __xen_mc_entry(0);
+#define DEBUG 1
+		if (in_frames) {
+#ifdef DEBUG
+			printk(KERN_INFO "%s:%d 0x%lx(pfn) 0x%lx (mfn) 0x%lx(vaddr)\n",
+				__func__, i, page_to_pfn(page),
+				pfn_to_mfn(page_to_pfn(page)), page_address(page)); #endif
+			in_frames[i] = pfn_to_mfn(page_to_pfn(page));
+		}
+		MULTI_update_va_mapping(mcs.mc, (unsigned long)page_address(page), VOID_PTE,
0);
+		set_phys_to_machine(page_to_pfn(page), INVALID_P2M_ENTRY);
+
+		if (out_frames)
+			out_frames[i] = page_to_pfn(page);
+		++n;
+
+	}
+	xen_mc_issue(0);
+	return n;
+}
 
 /*
  * Update the pfn-to-mfn mappings for a virtual address range, either to @@
-2118,6 +2156,53 @@ static void xen_remap_exchanged_ptes(unsigned long vaddr,
int order,
 
 	xen_mc_issue(0);
 }
+static void xen_remap_exchanged_pages(struct page *pages, int order,
+				     unsigned long *mfns,
+				     unsigned long first_mfn, /* in_frame if we failed*/
+				     void *limit_map)
+{
+	unsigned i, limit;
+	unsigned long mfn;
+	struct page *page;
+
+	xen_mc_batch();
+
+	limit = 1ULL << order;
+	for (i = 0; i < limit; i++) {
+		struct multicall_space mcs;
+		unsigned flags;
+
+		if (!test_bit(i, limit_map))
+			continue;
+
+		page = &pages[i];
+		mcs = __xen_mc_entry(0);
+		if (mfns)
+			mfn = mfns[i];
+		else
+			mfn = first_mfn + i;
+
+		if (i < (limit - 1))
+			flags = 0;
+		else {
+			if (order == 0)
+				flags = UVMF_INVLPG | UVMF_ALL;
+			else
+				flags = UVMF_TLB_FLUSH | UVMF_ALL;
+		}
+#ifdef DEBUG
+		printk(KERN_INFO "%s (%d) pfn:0x%lx, pfn: 0x%lx vaddr: 0x%lx\n",
+			__func__, i, page_to_pfn(page), mfn, page_address(page)); #endif
+		MULTI_update_va_mapping(mcs.mc, (unsigned long)page_address(page),
+				mfn_pte(mfn, PAGE_KERNEL), flags);
+
+		set_phys_to_machine(page_to_pfn(page), mfn);
+	}
+
+	xen_mc_issue(0);
+}
+
 
 /*
  * Perform the hypercall to exchange a region of our pfns to point to @@
-2136,7 +2221,9 @@ static int xen_exchange_memory(unsigned long extents_in,
unsigned int order_in,  {
 	long rc;
 	int success;
-
+#ifdef DEBUG
+	int i;
+#endif
 	struct xen_memory_exchange exchange = {
 		.in = {
 			.nr_extents   = extents_in,
@@ -2157,7 +2244,11 @@ static int xen_exchange_memory(unsigned long extents_in,
unsigned int order_in,
 
 	rc = HYPERVISOR_memory_op(XENMEM_exchange, &exchange);
 	success = (exchange.nr_exchanged == extents_in);
-
+#ifdef DEBUG
+	for (i = 0; i <  exchange.nr_exchanged; i++) {
+		printk(KERN_INFO "%s 0x%lx (mfn) <-> 0x%lx (mfn)\n", 
__func__,pfns_in[i], mfns_out[i]);
+	}
+#endif
 	BUG_ON(!success && ((exchange.nr_exchanged != 0) || (rc == 0)));
 	BUG_ON(success && (rc != 0));
 
@@ -2231,8 +2322,8 @@ void xen_destroy_contiguous_region(unsigned long vstart,
unsigned int order)
 	xen_zap_pfn_range(vstart, order, NULL, out_frames);
 
 	/* 3. Do the exchange for non-contiguous MFNs. */
-	success = xen_exchange_memory(1, order, &in_frame, 1UL << order,
-					0, out_frames, 0);
+	success = xen_exchange_memory(1, order, &in_frame,
+				      1UL << order, 0, out_frames, 0);
 
 	/* 4. Map new pages in place of old pages. */
 	if (success)
@@ -2244,6 +2335,94 @@ void xen_destroy_contiguous_region(unsigned long vstart,
unsigned int order)  }  EXPORT_SYMBOL_GPL(xen_destroy_contiguous_region);
 
+int xen_limit_pages_to_max_mfn(struct page *pages, unsigned int order,
+			       unsigned int address_bits)
+{
+	unsigned long *in_frames = discontig_frames, *out_frames = limited_frames;
+	unsigned long  flags;
+	struct page *page;
+	int success;
+	int i, n = 0;
+	unsigned long _limit_map;
+	unsigned long *limit_map;
+
+	if (xen_feature(XENFEAT_auto_translated_physmap))
+		return 0;
+
+	if (unlikely(order > MAX_CONTIG_ORDER))
+		return -ENOMEM;
+
+	if (BITS_PER_LONG >> order) {
+		limit_map = kzalloc(BITS_TO_LONGS(1U << order) *
+				    sizeof(*limit_map), GFP_KERNEL);
+		if (unlikely(!limit_map))
+			return -ENOMEM;
+	} else
+		limit_map = &_limit_map;
+
+	/* 0. Construct our per page bitmap lookup. */
+
+	if (address_bits && (address_bits < PAGE_SHIFT))
+			return -EINVAL;
+
+	if (order)
+		bitmap_zero(limit_map, 1U << order);
+	else
+		__set_bit(0, limit_map);
+
+	/* 1. Clear the pages */
+	for (i = 0; i < (1ULL << order); i++) {
+		void *vaddr;
+		page = &pages[i];
+
+		vaddr = page_address(page);
+#ifdef DEBUG
+		printk(KERN_INFO "%s: page: %p vaddr: %p 0x%lx(mfn) 0x%lx(pfn)\n", 
+__func__, page, vaddr, virt_to_mfn(vaddr), mfn_to_pfn(virt_to_mfn(vaddr)));
#endif
+		if (address_bits) {
+			if (!(virt_to_mfn(vaddr) >> (address_bits - PAGE_SHIFT)))
+				continue;
+			__set_bit(i, limit_map);
+		}
+		if (!PageHighMem(page))
+			memset(vaddr, 0, PAGE_SIZE);
+		else {
+			memset(kmap(page), 0, PAGE_SIZE);
+			kunmap(page);
+			++n;
+		}
+	}
+	/* Check to see if we actually have to do any work. */
+	if (bitmap_empty(limit_map, 1U << order)) {
+		if (limit_map != &_limit_map)
+			kfree(limit_map);
+		return 0;
+	}
+	if (n)
+		kmap_flush_unused();
+
+	spin_lock_irqsave(&xen_reservation_lock, flags);
+
+	/* 2. Zap current PTEs. */
+	n = xen_zap_page_range(pages, order, in_frames, NULL /*out_frames */, 
+limit_map);
+
+	/* 3. Do the exchange for non-contiguous MFNs. */
+	success = xen_exchange_memory(n, 0 /* this is always called per page */,
in_frames,
+				      n, 0, out_frames, address_bits);
+
+	/* 4. Map new pages in place of old pages. */
+	if (success)
+		xen_remap_exchanged_pages(pages, order, out_frames, 0, limit_map);
+	else
+		xen_remap_exchanged_pages(pages, order, NULL, *in_frames, limit_map);
+
+	spin_unlock_irqrestore(&xen_reservation_lock, flags);
+	if (limit_map != &_limit_map)
+		kfree(limit_map);
+
+	return success ? 0 : -ENOMEM;
+}
+EXPORT_SYMBOL_GPL(xen_limit_pages_to_max_mfn);
 #ifdef CONFIG_XEN_PVHVM
 static void xen_hvm_exit_mmap(struct mm_struct *mm)  { diff --git
a/include/xen/xen-ops.h b/include/xen/xen-ops.h index 6a198e4..2f8709f 100644
--- a/include/xen/xen-ops.h
+++ b/include/xen/xen-ops.h
@@ -29,4 +29,6 @@ int xen_remap_domain_mfn_range(struct vm_area_struct *vma,
 			       unsigned long mfn, int nr,
 			       pgprot_t prot, unsigned domid);
 
+int xen_limit_pages_to_max_mfn(struct page *pages, unsigned int order,
+			       unsigned int address_bits);
 #endif /* INCLUDE_XEN_OPS_H */
diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 2aad499..194af07 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -31,6 +31,8 @@
 #include <asm/tlbflush.h>
 #include <asm/shmparam.h>
 
+#include <xen/xen.h>
+#include <xen/xen-ops.h>
 /*** Page table manipulation functions ***/
 
 static void vunmap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end)
@@ -1576,7 +1578,11 @@ static void *__vmalloc_area_node(struct vm_struct *area,
gfp_t gfp_mask,
 	struct page **pages;
 	unsigned int nr_pages, array_size, i;
 	gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO;
-
+	gfp_t dma_mask = gfp_mask & (__GFP_DMA | __GFP_DMA32);
+	if (xen_pv_domain()) {
+		if (dma_mask == (__GFP_DMA | __GFP_DMA32))
+			gfp_mask &= ~(__GFP_DMA | __GFP_DMA32);
+	}
 	nr_pages = (area->size - PAGE_SIZE) >> PAGE_SHIFT;
 	array_size = (nr_pages * sizeof(struct page *));
 
@@ -1612,6 +1618,16 @@ static void *__vmalloc_area_node(struct vm_struct *area,
gfp_t gfp_mask,
 			goto fail;
 		}
 		area->pages[i] = page;
+		if (xen_pv_domain()) {
+			if (dma_mask) {
+				if (xen_limit_pages_to_max_mfn(page, 0, 32)) {
+					area->nr_pages = i + 1;
+					goto fail;
+				}
+			if (gfp_mask & __GFP_ZERO)
+				clear_highpage(page);
+			}
+		}
 	}
 
 	if (map_vm_area(area, prot, &pages))
--
1.7.7.6


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

-----
E-Mail ist virenfrei.
Von AVG überprüft - www.avg.de
Version: 2012.0.2180 / Virendatenbank: 2433/5067 - Ausgabedatum: 13.06.2012

Carsten Schiers

2012-Jun-14 18:43 UTC

head link

Re: Load increase after memory upgrade (part2)

It''s a 64 Bit kernel...

-----Ursprüngliche Nachricht-----
Von: Jan Beulich [mailto:JBeulich@suse.com] 
Gesendet: Donnerstag, 14. Juni 2012 09:08
An: Konrad Rzeszutek Wilk
Cc: Konrad Rzeszutek Wilk; Sander Eikelenboom; xen-devel; Carsten Schiers
Betreff: Re: [Xen-devel] Load increase after memory upgrade (part2)
>>> On 13.06.12 at 18:55, Konrad Rzeszutek Wilk
<konrad.wilk@oracle.com> wrote:
> @@ -1576,7 +1578,11 @@ static void *__vmalloc_area_node(struct vm_struct
*area, gfp_t gfp_mask,
>  	struct page **pages;
>  	unsigned int nr_pages, array_size, i;
>  	gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO;
> -
> +	gfp_t dma_mask = gfp_mask & (__GFP_DMA | __GFP_DMA32);
> +	if (xen_pv_domain()) {
> +		if (dma_mask == (__GFP_DMA | __GFP_DMA32))
As said in an earlier reply - without having any place that would ever set both
flags at once, this whole conditional is meaningless.
In our code - which I suppose is where you cloned this from - we set
GFP_VMALLOC32 to such a value for 32-bit kernels (which otherwise would merely
use GFP_KERNEL, and hence not trigger the code calling
xen_limit_pages_to_max_mfn()). I don''t recall though whether
Carsten''s problem was on a 32- or 64-bit kernel.

Jan
> +			gfp_mask &= ~(__GFP_DMA | __GFP_DMA32);
> +	}
>  	nr_pages = (area->size - PAGE_SIZE) >> PAGE_SHIFT;
>  	array_size = (nr_pages * sizeof(struct page *));
>  


-----
E-Mail ist virenfrei.
Von AVG überprüft - www.avg.de
Version: 2012.0.2180 / Virendatenbank: 2433/5069 - Ausgabedatum: 14.06.2012

Carsten Schiers

2012-Jun-14 19:16 UTC

head link

Re: Load increase after memory upgrade (part2)

OK, found the problem in the patch file, baking 3.4.2...BR, Carsten.

-----Ursprüngliche Nachricht-----
Von: xen-devel-bounces@lists.xen.org [mailto:xen-devel-bounces@lists.xen.org] Im
Auftrag von Carsten Schiers
Gesendet: Donnerstag, 14. Juni 2012 20:40
An: Konrad Rzeszutek Wilk
Cc: Konrad Rzeszutek Wilk; xen-devel; Jan Beulich; Sander Eikelenboom
Betreff: Re: [Xen-devel] Load increase after memory upgrade (part2)

Konrad, against which kernel version did you produce this patch? It will not
succeed with 3.4.2 at least, will look up some older version now...

-----Ursprüngliche Nachricht-----
Von: xen-devel-bounces@lists.xen.org [mailto:xen-devel-bounces@lists.xen.org] Im
Auftrag von Konrad Rzeszutek Wilk
Gesendet: Mittwoch, 13. Juni 2012 18:55
An: Carsten Schiers
Cc: Konrad Rzeszutek Wilk; xen-devel; Jan Beulich; Sander Eikelenboom
Betreff: Re: [Xen-devel] Load increase after memory upgrade (part2)

On Fri, May 11, 2012 at 03:41:38PM -0400, Konrad Rzeszutek Wilk
wrote:> On Fri, May 11, 2012 at 11:39:08AM +0200, Carsten Schiers wrote:
> > Hi Konrad,
> > 
> >  
> > don''t want to be pushy, as I have no real issue. I simply use
the Xenified kernel or take the double load.
> > 
> > But I think this mistery is still open. My last status was that the 
> > latest patch you produced resulted in a BUG,
> 
> Yes, that is right. Thank you for reminding me.
> > 
> > so we still have not checked whether our theory is correct.
> 
> No we haven''t. And I should be have no trouble reproducing this. I
can
> just write a tiny module that allocates vmalloc_32().
Done. Found some bugs.. and here is anew version. Can you please try it out? It
has the #define DEBUG 1 set so it should print a lot of stuff when the DVB
module loads. If it crashes please send me the full log.

Thanks.
From 5afb4ab1fb3d2b059fe1a6db93ab65cb76f43b8a Mon Sep 17 00:00:00 2001
From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Date: Thu, 31 May 2012 14:21:04 -0400
Subject: [PATCH] xen/vmalloc_32: Use xen_exchange_.. when GFP flags are DMA.
 [v3]

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 arch/x86/xen/mmu.c    |  187 +++++++++++++++++++++++++++++++++++++++++++++++-
 include/xen/xen-ops.h |    2 +
 mm/vmalloc.c          |   18 +++++-
 3 files changed, 202 insertions(+), 5 deletions(-)

diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c index 3a73785..960d206
100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -47,6 +47,7 @@
 #include <linux/gfp.h>
 #include <linux/memblock.h>
 #include <linux/seq_file.h>
+#include <linux/slab.h>
 
 #include <trace/events/xen.h>
 
@@ -2051,6 +2052,7 @@ void __init xen_init_mmu_ops(void)
 /* Protected by xen_reservation_lock. */  #define MAX_CONTIG_ORDER 9 /* 2MB */ 
static unsigned long discontig_frames[1<<MAX_CONTIG_ORDER];
+static unsigned long limited_frames[1<<MAX_CONTIG_ORDER];
 
 #define VOID_PTE (mfn_pte(0, __pgprot(0)))  static void
xen_zap_pfn_range(unsigned long vaddr, unsigned int order, @@ -2075,6 +2077,42
@@ static void xen_zap_pfn_range(unsigned long vaddr, unsigned int order,
 	}
 	xen_mc_issue(0);
 }
+static int xen_zap_page_range(struct page *pages, unsigned int order,
+				unsigned long *in_frames,
+				unsigned long *out_frames,
+				void *limit_bitmap)
+{
+	int i, n = 0;
+	struct multicall_space mcs;
+	struct page *page;
+
+	xen_mc_batch();
+	for (i = 0; i < (1UL<<order); i++) {
+		if (!test_bit(i, limit_bitmap))
+			continue;
+
+		page = &pages[i];
+		mcs = __xen_mc_entry(0);
+#define DEBUG 1
+		if (in_frames) {
+#ifdef DEBUG
+			printk(KERN_INFO "%s:%d 0x%lx(pfn) 0x%lx (mfn) 0x%lx(vaddr)\n",
+				__func__, i, page_to_pfn(page),
+				pfn_to_mfn(page_to_pfn(page)), page_address(page)); #endif
+			in_frames[i] = pfn_to_mfn(page_to_pfn(page));
+		}
+		MULTI_update_va_mapping(mcs.mc, (unsigned long)page_address(page), VOID_PTE,
0);
+		set_phys_to_machine(page_to_pfn(page), INVALID_P2M_ENTRY);
+
+		if (out_frames)
+			out_frames[i] = page_to_pfn(page);
+		++n;
+
+	}
+	xen_mc_issue(0);
+	return n;
+}
 
 /*
  * Update the pfn-to-mfn mappings for a virtual address range, either to @@
-2118,6 +2156,53 @@ static void xen_remap_exchanged_ptes(unsigned long vaddr,
int order,
 
 	xen_mc_issue(0);
 }
+static void xen_remap_exchanged_pages(struct page *pages, int order,
+				     unsigned long *mfns,
+				     unsigned long first_mfn, /* in_frame if we failed*/
+				     void *limit_map)
+{
+	unsigned i, limit;
+	unsigned long mfn;
+	struct page *page;
+
+	xen_mc_batch();
+
+	limit = 1ULL << order;
+	for (i = 0; i < limit; i++) {
+		struct multicall_space mcs;
+		unsigned flags;
+
+		if (!test_bit(i, limit_map))
+			continue;
+
+		page = &pages[i];
+		mcs = __xen_mc_entry(0);
+		if (mfns)
+			mfn = mfns[i];
+		else
+			mfn = first_mfn + i;
+
+		if (i < (limit - 1))
+			flags = 0;
+		else {
+			if (order == 0)
+				flags = UVMF_INVLPG | UVMF_ALL;
+			else
+				flags = UVMF_TLB_FLUSH | UVMF_ALL;
+		}
+#ifdef DEBUG
+		printk(KERN_INFO "%s (%d) pfn:0x%lx, pfn: 0x%lx vaddr: 0x%lx\n",
+			__func__, i, page_to_pfn(page), mfn, page_address(page)); #endif
+		MULTI_update_va_mapping(mcs.mc, (unsigned long)page_address(page),
+				mfn_pte(mfn, PAGE_KERNEL), flags);
+
+		set_phys_to_machine(page_to_pfn(page), mfn);
+	}
+
+	xen_mc_issue(0);
+}
+
 
 /*
  * Perform the hypercall to exchange a region of our pfns to point to @@
-2136,7 +2221,9 @@ static int xen_exchange_memory(unsigned long extents_in,
unsigned int order_in,  {
 	long rc;
 	int success;
-
+#ifdef DEBUG
+	int i;
+#endif
 	struct xen_memory_exchange exchange = {
 		.in = {
 			.nr_extents   = extents_in,
@@ -2157,7 +2244,11 @@ static int xen_exchange_memory(unsigned long extents_in,
unsigned int order_in,
 
 	rc = HYPERVISOR_memory_op(XENMEM_exchange, &exchange);
 	success = (exchange.nr_exchanged == extents_in);
-
+#ifdef DEBUG
+	for (i = 0; i <  exchange.nr_exchanged; i++) {
+		printk(KERN_INFO "%s 0x%lx (mfn) <-> 0x%lx (mfn)\n", 
__func__,pfns_in[i], mfns_out[i]);
+	}
+#endif
 	BUG_ON(!success && ((exchange.nr_exchanged != 0) || (rc == 0)));
 	BUG_ON(success && (rc != 0));
 
@@ -2231,8 +2322,8 @@ void xen_destroy_contiguous_region(unsigned long vstart,
unsigned int order)
 	xen_zap_pfn_range(vstart, order, NULL, out_frames);
 
 	/* 3. Do the exchange for non-contiguous MFNs. */
-	success = xen_exchange_memory(1, order, &in_frame, 1UL << order,
-					0, out_frames, 0);
+	success = xen_exchange_memory(1, order, &in_frame,
+				      1UL << order, 0, out_frames, 0);
 
 	/* 4. Map new pages in place of old pages. */
 	if (success)
@@ -2244,6 +2335,94 @@ void xen_destroy_contiguous_region(unsigned long vstart,
unsigned int order)  }  EXPORT_SYMBOL_GPL(xen_destroy_contiguous_region);
 
+int xen_limit_pages_to_max_mfn(struct page *pages, unsigned int order,
+			       unsigned int address_bits)
+{
+	unsigned long *in_frames = discontig_frames, *out_frames = limited_frames;
+	unsigned long  flags;
+	struct page *page;
+	int success;
+	int i, n = 0;
+	unsigned long _limit_map;
+	unsigned long *limit_map;
+
+	if (xen_feature(XENFEAT_auto_translated_physmap))
+		return 0;
+
+	if (unlikely(order > MAX_CONTIG_ORDER))
+		return -ENOMEM;
+
+	if (BITS_PER_LONG >> order) {
+		limit_map = kzalloc(BITS_TO_LONGS(1U << order) *
+				    sizeof(*limit_map), GFP_KERNEL);
+		if (unlikely(!limit_map))
+			return -ENOMEM;
+	} else
+		limit_map = &_limit_map;
+
+	/* 0. Construct our per page bitmap lookup. */
+
+	if (address_bits && (address_bits < PAGE_SHIFT))
+			return -EINVAL;
+
+	if (order)
+		bitmap_zero(limit_map, 1U << order);
+	else
+		__set_bit(0, limit_map);
+
+	/* 1. Clear the pages */
+	for (i = 0; i < (1ULL << order); i++) {
+		void *vaddr;
+		page = &pages[i];
+
+		vaddr = page_address(page);
+#ifdef DEBUG
+		printk(KERN_INFO "%s: page: %p vaddr: %p 0x%lx(mfn) 0x%lx(pfn)\n", 
+__func__, page, vaddr, virt_to_mfn(vaddr), mfn_to_pfn(virt_to_mfn(vaddr)));
#endif
+		if (address_bits) {
+			if (!(virt_to_mfn(vaddr) >> (address_bits - PAGE_SHIFT)))
+				continue;
+			__set_bit(i, limit_map);
+		}
+		if (!PageHighMem(page))
+			memset(vaddr, 0, PAGE_SIZE);
+		else {
+			memset(kmap(page), 0, PAGE_SIZE);
+			kunmap(page);
+			++n;
+		}
+	}
+	/* Check to see if we actually have to do any work. */
+	if (bitmap_empty(limit_map, 1U << order)) {
+		if (limit_map != &_limit_map)
+			kfree(limit_map);
+		return 0;
+	}
+	if (n)
+		kmap_flush_unused();
+
+	spin_lock_irqsave(&xen_reservation_lock, flags);
+
+	/* 2. Zap current PTEs. */
+	n = xen_zap_page_range(pages, order, in_frames, NULL /*out_frames */, 
+limit_map);
+
+	/* 3. Do the exchange for non-contiguous MFNs. */
+	success = xen_exchange_memory(n, 0 /* this is always called per page */,
in_frames,
+				      n, 0, out_frames, address_bits);
+
+	/* 4. Map new pages in place of old pages. */
+	if (success)
+		xen_remap_exchanged_pages(pages, order, out_frames, 0, limit_map);
+	else
+		xen_remap_exchanged_pages(pages, order, NULL, *in_frames, limit_map);
+
+	spin_unlock_irqrestore(&xen_reservation_lock, flags);
+	if (limit_map != &_limit_map)
+		kfree(limit_map);
+
+	return success ? 0 : -ENOMEM;
+}
+EXPORT_SYMBOL_GPL(xen_limit_pages_to_max_mfn);
 #ifdef CONFIG_XEN_PVHVM
 static void xen_hvm_exit_mmap(struct mm_struct *mm)  { diff --git
a/include/xen/xen-ops.h b/include/xen/xen-ops.h index 6a198e4..2f8709f 100644
--- a/include/xen/xen-ops.h
+++ b/include/xen/xen-ops.h
@@ -29,4 +29,6 @@ int xen_remap_domain_mfn_range(struct vm_area_struct *vma,
 			       unsigned long mfn, int nr,
 			       pgprot_t prot, unsigned domid);
 
+int xen_limit_pages_to_max_mfn(struct page *pages, unsigned int order,
+			       unsigned int address_bits);
 #endif /* INCLUDE_XEN_OPS_H */
diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 2aad499..194af07 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -31,6 +31,8 @@
 #include <asm/tlbflush.h>
 #include <asm/shmparam.h>
 
+#include <xen/xen.h>
+#include <xen/xen-ops.h>
 /*** Page table manipulation functions ***/
 
 static void vunmap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end)
@@ -1576,7 +1578,11 @@ static void *__vmalloc_area_node(struct vm_struct *area,
gfp_t gfp_mask,
 	struct page **pages;
 	unsigned int nr_pages, array_size, i;
 	gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO;
-
+	gfp_t dma_mask = gfp_mask & (__GFP_DMA | __GFP_DMA32);
+	if (xen_pv_domain()) {
+		if (dma_mask == (__GFP_DMA | __GFP_DMA32))
+			gfp_mask &= ~(__GFP_DMA | __GFP_DMA32);
+	}
 	nr_pages = (area->size - PAGE_SIZE) >> PAGE_SHIFT;
 	array_size = (nr_pages * sizeof(struct page *));
 
@@ -1612,6 +1618,16 @@ static void *__vmalloc_area_node(struct vm_struct *area,
gfp_t gfp_mask,
 			goto fail;
 		}
 		area->pages[i] = page;
+		if (xen_pv_domain()) {
+			if (dma_mask) {
+				if (xen_limit_pages_to_max_mfn(page, 0, 32)) {
+					area->nr_pages = i + 1;
+					goto fail;
+				}
+			if (gfp_mask & __GFP_ZERO)
+				clear_highpage(page);
+			}
+		}
 	}
 
 	if (map_vm_area(area, prot, &pages))
--
1.7.7.6


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

-----
E-Mail ist virenfrei.
Von AVG überprüft - www.avg.de
Version: 2012.0.2180 / Virendatenbank: 2433/5067 - Ausgabedatum: 13.06.2012 


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

-----
E-Mail ist virenfrei.
Von AVG überprüft - www.avg.de
Version: 2012.0.2180 / Virendatenbank: 2433/5069 - Ausgabedatum: 14.06.2012

Xen devel - Nov 2011 - Load increase after memory upgrade (part2)

Load increase after memory upgrade (part2)

Re: Load increase after memory upgrade (part2)

Re: Load increase after memory upgrade (part2)

Re: Load increase after memory upgrade (part2)

Re: Load increase after memory upgrade (part2)

Re: Load increase after memory upgrade (part2)

Re: Load increase after memory upgrade (part2)

Re: Load increase after memory upgrade (part2)

Re: Load increase after memory upgrade (part2)

Re: Load increase after memory upgrade (part2)

Re: Load increase after memory upgrade (part2)

Re: Load increase after memory upgrade (part2)

Re: Load increase after memory upgrade (part2)

Re: Load increase after memory upgrade (part2)

Re: Load increase after memory upgrade (part2)

Re: Load increase after memory upgrade (part2)

Re: Load increase after memory upgrade (part2)

Re: Load increase after memory upgrade (part2)

Re: Load increase after memory upgrade (part2)

Re: Load increase after memory upgrade (part2)

Re: Load increase after memory upgrade (part2)

Re: Load increase after memory upgrade (part2)

Re: Load increase after memory upgrade (part2)

Re: Load increase after memory upgrade (part2)

Re: Load increase after memory upgrade (part2)

Re: Load increase after memory upgrade (part2)

Re: Load increase after memory upgrade (part2)

Re: Load increase after memory upgrade (part2)

Re: Load increase after memory upgrade (part2)

Re: Load increase after memory upgrade (part2)

Re: Load increase after memory upgrade (part2)

Re: Load increase after memory upgrade (part2)

Re: Load increase after memory upgrade (part2)

Re: Load increase after memory upgrade (part2)

Re: Load increase after memory upgrade (part2)

Re: Load increase after memory upgrade (part2)

Re: Load increase after memory upgrade (part2)

Re: Load increase after memory upgrade (part2)

Re: Load increase after memory upgrade (part2)

Re: Load increase after memory upgrade (part2)

Re: Load increase after memory upgrade (part2)

Re: Load increase after memory upgrade (part2)

Re: Load increase after memory upgrade (part2)

Re: Load increase after memory upgrade (part2)

Re: Load increase after memory upgrade (part2)

Re: Load increase after memory upgrade (part2)

Re: Load increase after memory upgrade (part2)

Re: Load increase after memory upgrade (part2)

Re: Load increase after memory upgrade (part2)

Re: Load increase after memory upgrade (part2)

Re: Load increase after memory upgrade (part2)

Re: Load increase after memory upgrade (part2)

Re: Load increase after memory upgrade (part2)

Re: Load increase after memory upgrade (part2)

Re: Load increase after memory upgrade (part2)

Re: Load increase after memory upgrade (part2)

Re: Load increase after memory upgrade (part2)

Re: Load increase after memory upgrade (part2)

Re: Load increase after memory upgrade (part2)

Re: Load increase after memory upgrade (part2)

Re: Load increase after memory upgrade (part2)

Re: Load increase after memory upgrade (part2)

Re: Load increase after memory upgrade (part2)

Re: Load increase after memory upgrade (part2)

Re: Load increase after memory upgrade (part2)

Re: Load increase after memory upgrade (part2)