Hello again, I would like to come back to that thing...sorry that I did not have the time up to now. We (now) speak about * Xen 4.1.2 * Dom0 is Jeremy''s 2.6.32.46 64 bit * DomU in question is now 3.1.2 64 bit * Same thing if DomU is also 2.6.32.46 * DomU owns two PCI cards (DVB-C) that o DMA * Machine has 8GB, Dom0 pinned at 512MB As compared to 2.6.34 Kernel with backported patches, the load on the DomU is at least twice as high. It will be "close to normal" if I reduce the memory used to 4GB. As you can see from the attachment, you once had an idea. So should we try to find something...? Carsten. -----Ursprüngliche Nachricht----- An:konrad.wilk <konrad.wilk@oracle.com>; CC:linux <linux@eikelenboom.it>; xen-devel <xen-devel@lists.xensource.com>; Von:Carsten Schiers <carsten@schiers.de> Gesendet:Mi 29.06.2011 23:17 Betreff:AW: Re: Re: Re: AW: Re: [Xen-devel] AW: Load increase after memory upgrade?> Lets first do the c) experiment as that will likely explain your load average increase....> >c). If you want to see if the fault here lies in the bounce buffer > being used more > >often in the DomU b/c you have 8GB of memory now and you end up using > more pages > >past 4GB (in DomU), I can cook up a patch to figure this out. But an > easier way is > >to just do (on the Xen hypervisor line): mem=4G and that will make > think you only have > >4GB of physical RAM. If the load comes back to the normal "amount" > then the likely > >culprit is that and we can think on how to fix this.You are on the right track. Load was going down to "normal" 10% when reducing Xen to 4GB by the parameter. Load seems to be still a little, little bit lower with Xenified Kernel (8-9%), but this is drastically lower than the 20% we had before. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2011-Nov-25 18:42 UTC
Re: Load increase after memory upgrade (part2)
On Thu, Nov 24, 2011 at 01:28:44PM +0100, Carsten Schiers wrote:> Hello again, I would like to come back to that thing...sorry that I did not have the time up to now. > > ?? > We (now) speak about > > ?? > * Xen 4.1.2 > * Dom0 is Jeremy''s 2.6.32.46 64 bit > * DomU in question is now 3.1.2 64 bit > * Same thing if DomU is also 2.6.32.46 > * DomU owns two PCI cards (DVB-C) that o DMA > * Machine has 8GB, Dom0 pinned at 512MB > > ?? > As compared to 2.6.34 Kernel with backported patches, the load on the DomU is at least twice as high. It > > will be "close to normal" if I reduce the memory used to 4GB.That is in the dom0 or just in general on the machine?> > ?? > As you can see from the attachment, you once had an idea. So should we try to find something...?I think that was to instrument swiotlb to give an idea of how often it is called and basically have a matrix of its load. And from there figure out if the issue is that: 1). The drivers allocoate/bounce/deallocate buffers on every interrupt (bad, driver should be using some form of dma pool and most of the ivtv do that) 2). The buffers allocated to the drivers are above the 4GB and we end up bouncing it needlessly. That can happen if the dom0 has most of the precious memory under 4GB. However, that is usually not the case as the domain isusually allocated from the top of the memory. The fix for that was to set dom0_mem=max:XX. .. but with Dom0 kernels before 3.1, the parameter would be ignored, so you had to use ''mem=XX'' on the Linux command line as well. 3). Where did you get the load values? Was it dom0? or domU?> > ?? > Carsten. > ?? > -----Urspr??ngliche Nachricht----- > An:konrad.wilk <konrad.wilk@oracle.com>; > CC:linux <linux@eikelenboom.it>; xen-devel <xen-devel@lists.xensource.com>; > Von:Carsten Schiers <carsten@schiers.de> > Gesendet:Mi 29.06.2011 23:17 > Betreff:AW: Re: Re: Re: AW: Re: [Xen-devel] AW: Load increase after memory upgrade? > > Lets first do the c) experiment as that will likely explain your load average increase. > ... > > >c). If you want to see if the fault here lies in the bounce buffer > > being used more > > >often in the DomU b/c you have 8GB of memory now and you end up using > > more pages > > >past 4GB (in DomU), I can cook up a patch to figure this out. But an > > easier way is > > >to just do (on the Xen hypervisor line): mem=4G and that will make > > think you only have > > >4GB of physical RAM. ??If the load comes back to the normal "amount" > > then the likely > > >culprit is that and we can think on how to fix this. > > You are on the right track. Load was going down to "normal" 10% when reducing > Xen to 4GB by the parameter. Load seems to be still a little, little bit lower > with Xenified Kernel (8-9%), but this is drastically lower than the 20% we had > before.> _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel
I got the values in DomU. I will have - aprox. 5% load in DomU with 2.6.34 Xenified Kernel - aprox. 15% load in DomU with 2.6.32.46 Jeremy or 3.1.2 Kernel with one card attached - aprox. 30% load in DomU with 2.6.32.46 Jeremy or 3.1.2 Kernel with two cards attached I looked through my old mails from you and you explained already the necessity of double bounce buffering (PCI->below 4GB->above 4GB). What I don''t understand is: why does the Xenified kernel not have this kind of issue? The driver in question is nearly identical between the two kernel versions. It is in Drivers/media/dvb/ttpci by the way and when I understood the code right, the allo in question is: /* allocate and init buffers */ av7110->debi_virt = pci_alloc_consistent(pdev, 8192, &av7110->debi_bus); if (!av7110->debi_virt) goto err_saa71466_vfree_4; isn''t it? I think the cards are constantly transferring the stream received through DMA. I have set dom0_mem=512M by the way, shall I change that in some way? I can try out some things, if you want me to. But I have no idea what to do and where to start, so I rely on your help... Carsten. -----Ursprüngliche Nachricht----- Von: xen-devel-bounces@lists.xensource.com [mailto:xen-devel-bounces@lists.xensource.com] Im Auftrag von Konrad Rzeszutek Wilk Gesendet: Freitag, 25. November 2011 19:43 An: Carsten Schiers Cc: xen-devel; konrad.wilk Betreff: Re: [Xen-devel] Load increase after memory upgrade (part2) On Thu, Nov 24, 2011 at 01:28:44PM +0100, Carsten Schiers wrote:> Hello again, I would like to come back to that thing...sorry that I did not have the time up to now. > > ?? > We (now) speak about > > ?? > * Xen 4.1.2 > * Dom0 is Jeremy''s 2.6.32.46 64 bit > * DomU in question is now 3.1.2 64 bit > * Same thing if DomU is also 2.6.32.46 > * DomU owns two PCI cards (DVB-C) that o DMA > * Machine has 8GB, Dom0 pinned at 512MB > > ?? > As compared to 2.6.34 Kernel with backported patches, the load on the DomU is at least twice as high. It > > will be "close to normal" if I reduce the memory used to 4GB.That is in the dom0 or just in general on the machine?> > ?? > As you can see from the attachment, you once had an idea. So should we try to find something...?I think that was to instrument swiotlb to give an idea of how often it is called and basically have a matrix of its load. And from there figure out if the issue is that: 1). The drivers allocoate/bounce/deallocate buffers on every interrupt (bad, driver should be using some form of dma pool and most of the ivtv do that) 2). The buffers allocated to the drivers are above the 4GB and we end up bouncing it needlessly. That can happen if the dom0 has most of the precious memory under 4GB. However, that is usually not the case as the domain isusually allocated from the top of the memory. The fix for that was to set dom0_mem=max:XX. .. but with Dom0 kernels before 3.1, the parameter would be ignored, so you had to use ''mem=XX'' on the Linux command line as well. 3). Where did you get the load values? Was it dom0? or domU?> > ?? > Carsten. > ?? > -----Urspr??ngliche Nachricht----- > An:konrad.wilk <konrad.wilk@oracle.com>; > CC:linux <linux@eikelenboom.it>; xen-devel <xen-devel@lists.xensource.com>; > Von:Carsten Schiers <carsten@schiers.de> > Gesendet:Mi 29.06.2011 23:17 > Betreff:AW: Re: Re: Re: AW: Re: [Xen-devel] AW: Load increase after memory upgrade? > > Lets first do the c) experiment as that will likely explain your load average increase. > ... > > >c). If you want to see if the fault here lies in the bounce buffer > > being used more > > >often in the DomU b/c you have 8GB of memory now and you end up using > > more pages > > >past 4GB (in DomU), I can cook up a patch to figure this out. But an > > easier way is > > >to just do (on the Xen hypervisor line): mem=4G and that will make > > think you only have > > >4GB of physical RAM. ??If the load comes back to the normal "amount" > > then the likely > > >culprit is that and we can think on how to fix this. > > You are on the right track. Load was going down to "normal" 10% when reducing > Xen to 4GB by the parameter. Load seems to be still a little, little bit lower > with Xenified Kernel (8-9%), but this is drastically lower than the 20% we had > before.> _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
To add (read from some munin statistics I made over the time): - with load I mean the %CPU of xentop - there is no change in CPU usage of the DomU or Dom0 - xenpm shows the core dedicated to that DomU is doing more work Also I need to say that reduction to 4GB was performed by Xen parameter. Carsten. -----Ursprüngliche Nachricht----- Von: Konrad Rzeszutek Wilk [mailto:konrad@darnok.org] Gesendet: Freitag, 25. November 2011 19:43 An: Carsten Schiers Cc: konrad.wilk; xen-devel Betreff: Re: [Xen-devel] Load increase after memory upgrade (part2) On Thu, Nov 24, 2011 at 01:28:44PM +0100, Carsten Schiers wrote:> Hello again, I would like to come back to that thing...sorry that I did not have the time up to now. > > ?? > We (now) speak about > > ?? > * Xen 4.1.2 > * Dom0 is Jeremy''s 2.6.32.46 64 bit > * DomU in question is now 3.1.2 64 bit > * Same thing if DomU is also 2.6.32.46 > * DomU owns two PCI cards (DVB-C) that o DMA > * Machine has 8GB, Dom0 pinned at 512MB > > ?? > As compared to 2.6.34 Kernel with backported patches, the load on the DomU is at least twice as high. It > > will be "close to normal" if I reduce the memory used to 4GB.That is in the dom0 or just in general on the machine?> > ?? > As you can see from the attachment, you once had an idea. So should we try to find something...?I think that was to instrument swiotlb to give an idea of how often it is called and basically have a matrix of its load. And from there figure out if the issue is that: 1). The drivers allocoate/bounce/deallocate buffers on every interrupt (bad, driver should be using some form of dma pool and most of the ivtv do that) 2). The buffers allocated to the drivers are above the 4GB and we end up bouncing it needlessly. That can happen if the dom0 has most of the precious memory under 4GB. However, that is usually not the case as the domain isusually allocated from the top of the memory. The fix for that was to set dom0_mem=max:XX. .. but with Dom0 kernels before 3.1, the parameter would be ignored, so you had to use ''mem=XX'' on the Linux command line as well. 3). Where did you get the load values? Was it dom0? or domU?> > ?? > Carsten. > ?? > -----Urspr??ngliche Nachricht----- > An:konrad.wilk <konrad.wilk@oracle.com>; > CC:linux <linux@eikelenboom.it>; xen-devel <xen-devel@lists.xensource.com>; > Von:Carsten Schiers <carsten@schiers.de> > Gesendet:Mi 29.06.2011 23:17 > Betreff:AW: Re: Re: Re: AW: Re: [Xen-devel] AW: Load increase after memory upgrade? > > Lets first do the c) experiment as that will likely explain your load average increase. > ... > > >c). If you want to see if the fault here lies in the bounce buffer > > being used more > > >often in the DomU b/c you have 8GB of memory now and you end up using > > more pages > > >past 4GB (in DomU), I can cook up a patch to figure this out. But an > > easier way is > > >to just do (on the Xen hypervisor line): mem=4G and that will make > > think you only have > > >4GB of physical RAM. ??If the load comes back to the normal "amount" > > then the likely > > >culprit is that and we can think on how to fix this. > > You are on the right track. Load was going down to "normal" 10% when reducing > Xen to 4GB by the parameter. Load seems to be still a little, little bit lower > with Xenified Kernel (8-9%), but this is drastically lower than the 20% we had > before.> _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2011-Nov-28 15:28 UTC
Re: Load increase after memory upgrade (part2)
On Fri, Nov 25, 2011 at 11:11:55PM +0100, Carsten Schiers wrote:> I got the values in DomU. I will have > > - aprox. 5% load in DomU with 2.6.34 Xenified Kernel > - aprox. 15% load in DomU with 2.6.32.46 Jeremy or 3.1.2 Kernel with one card attached > - aprox. 30% load in DomU with 2.6.32.46 Jeremy or 3.1.2 Kernel with two cards attachedHA! I just wonder if the issue is that the reporting of CPU spent is wrong. Laszlo Ersek and Zhenzhong Duan have both reported a bug in the pvops code when it came to account of CPU time.> > I looked through my old mails from you and you explained already the necessity of double > bounce buffering (PCI->below 4GB->above 4GB). What I don''t understand is: why does the > Xenified kernel not have this kind of issue?That is a puzzle. It should not. The code is very much the same - both use the generic SWIOTLB which has not changed for years.> > The driver in question is nearly identical between the two kernel versions. It is in > Drivers/media/dvb/ttpci by the way and when I understood the code right, the allo in > question is: > > /* allocate and init buffers */ > av7110->debi_virt = pci_alloc_consistent(pdev, 8192, &av7110->debi_bus);Good. So it allocates it during init and uses it.> if (!av7110->debi_virt) > goto err_saa71466_vfree_4; > > isn''t it? I think the cards are constantly transferring the stream received through DMA.Yeah, and that memory is set aside for the life of the driver. So there should be no bounce buffering happening (as it allocated the memory below the 4GB mark).> > I have set dom0_mem=512M by the way, shall I change that in some way?Does the reporting (CPU usage of DomU) change in any way with that?> > I can try out some things, if you want me to. But I have no idea what to do and where to > start, so I rely on your help... > > Carsten. > > -----Urspr?ngliche Nachricht----- > Von: xen-devel-bounces@lists.xensource.com [mailto:xen-devel-bounces@lists.xensource.com] Im Auftrag von Konrad Rzeszutek Wilk > Gesendet: Freitag, 25. November 2011 19:43 > An: Carsten Schiers > Cc: xen-devel; konrad.wilk > Betreff: Re: [Xen-devel] Load increase after memory upgrade (part2) > > On Thu, Nov 24, 2011 at 01:28:44PM +0100, Carsten Schiers wrote: > > Hello again, I would like to come back to that thing...sorry that I did not have the time up to now. > > > > ?? > > We (now) speak about > > > > ?? > > * Xen 4.1.2 > > * Dom0 is Jeremy''s 2.6.32.46 64 bit > > * DomU in question is now 3.1.2 64 bit > > * Same thing if DomU is also 2.6.32.46 > > * DomU owns two PCI cards (DVB-C) that o DMA > > * Machine has 8GB, Dom0 pinned at 512MB > > > > ?? > > As compared to 2.6.34 Kernel with backported patches, the load on the DomU is at least twice as high. It > > > > will be "close to normal" if I reduce the memory used to 4GB. > > That is in the dom0 or just in general on the machine? > > > > ?? > > As you can see from the attachment, you once had an idea. So should we try to find something...? > > I think that was to instrument swiotlb to give an idea of how > often it is called and basically have a matrix of its load. And > from there figure out if the issue is that: > > 1). The drivers allocoate/bounce/deallocate buffers on every interrupt > (bad, driver should be using some form of dma pool and most of the > ivtv do that) > > 2). The buffers allocated to the drivers are above the 4GB and we end > up bouncing it needlessly. That can happen if the dom0 has most of > the precious memory under 4GB. However, that is usually not the case > as the domain isusually allocated from the top of the memory. The > fix for that was to set dom0_mem=max:XX. .. but with Dom0 kernels > before 3.1, the parameter would be ignored, so you had to use > ''mem=XX'' on the Linux command line as well. > > 3). Where did you get the load values? Was it dom0? or domU? > > > > > > > ?? > > Carsten. > > ?? > > -----Urspr??ngliche Nachricht----- > > An:konrad.wilk <konrad.wilk@oracle.com>; > > CC:linux <linux@eikelenboom.it>; xen-devel <xen-devel@lists.xensource.com>; > > Von:Carsten Schiers <carsten@schiers.de> > > Gesendet:Mi 29.06.2011 23:17 > > Betreff:AW: Re: Re: Re: AW: Re: [Xen-devel] AW: Load increase after memory upgrade? > > > Lets first do the c) experiment as that will likely explain your load average increase. > > ... > > > >c). If you want to see if the fault here lies in the bounce buffer > > > being used more > > > >often in the DomU b/c you have 8GB of memory now and you end up using > > > more pages > > > >past 4GB (in DomU), I can cook up a patch to figure this out. But an > > > easier way is > > > >to just do (on the Xen hypervisor line): mem=4G and that will make > > > think you only have > > > >4GB of physical RAM. ??If the load comes back to the normal "amount" > > > then the likely > > > >culprit is that and we can think on how to fix this. > > > > You are on the right track. Load was going down to "normal" 10% when reducing > > Xen to 4GB by the parameter. Load seems to be still a little, little bit lower > > with Xenified Kernel (8-9%), but this is drastically lower than the 20% we had > > before. > > > _______________________________________________ > > Xen-devel mailing list > > Xen-devel@lists.xensource.com > > http://lists.xensource.com/xen-devel > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel >
Konrad Rzeszutek Wilk
2011-Nov-28 15:30 UTC
Re: Load increase after memory upgrade (part2)
On Sat, Nov 26, 2011 at 10:14:08AM +0100, Carsten Schiers wrote:> To add (read from some munin statistics I made over the time): > > - with load I mean the %CPU of xentop > - there is no change in CPU usage of the DomU or Dom0Uhh, which matrix are using for that? CPU usage...? This is if you change the DomU or the amount of memory the guest has? This is not the load number (xentop value)?> - xenpm shows the core dedicated to that DomU is doing more work > > Also I need to say that reduction to 4GB was performed by Xen parameter. > > Carsten. > > > -----Urspr?ngliche Nachricht----- > Von: Konrad Rzeszutek Wilk [mailto:konrad@darnok.org] > Gesendet: Freitag, 25. November 2011 19:43 > An: Carsten Schiers > Cc: konrad.wilk; xen-devel > Betreff: Re: [Xen-devel] Load increase after memory upgrade (part2) > > On Thu, Nov 24, 2011 at 01:28:44PM +0100, Carsten Schiers wrote: > > Hello again, I would like to come back to that thing...sorry that I did not have the time up to now. > > > > ?? > > We (now) speak about > > > > ?? > > * Xen 4.1.2 > > * Dom0 is Jeremy''s 2.6.32.46 64 bit > > * DomU in question is now 3.1.2 64 bit > > * Same thing if DomU is also 2.6.32.46 > > * DomU owns two PCI cards (DVB-C) that o DMA > > * Machine has 8GB, Dom0 pinned at 512MB > > > > ?? > > As compared to 2.6.34 Kernel with backported patches, the load on the DomU is at least twice as high. It > > > > will be "close to normal" if I reduce the memory used to 4GB. > > That is in the dom0 or just in general on the machine? > > > > ?? > > As you can see from the attachment, you once had an idea. So should we try to find something...? > > I think that was to instrument swiotlb to give an idea of how > often it is called and basically have a matrix of its load. And > from there figure out if the issue is that: > > 1). The drivers allocoate/bounce/deallocate buffers on every interrupt > (bad, driver should be using some form of dma pool and most of the > ivtv do that) > > 2). The buffers allocated to the drivers are above the 4GB and we end > up bouncing it needlessly. That can happen if the dom0 has most of > the precious memory under 4GB. However, that is usually not the case > as the domain isusually allocated from the top of the memory. The > fix for that was to set dom0_mem=max:XX. .. but with Dom0 kernels > before 3.1, the parameter would be ignored, so you had to use > ''mem=XX'' on the Linux command line as well. > > 3). Where did you get the load values? Was it dom0? or domU? > > > > > > > ?? > > Carsten. > > ?? > > -----Urspr??ngliche Nachricht----- > > An:konrad.wilk <konrad.wilk@oracle.com>; > > CC:linux <linux@eikelenboom.it>; xen-devel <xen-devel@lists.xensource.com>; > > Von:Carsten Schiers <carsten@schiers.de> > > Gesendet:Mi 29.06.2011 23:17 > > Betreff:AW: Re: Re: Re: AW: Re: [Xen-devel] AW: Load increase after memory upgrade? > > > Lets first do the c) experiment as that will likely explain your load average increase. > > ... > > > >c). If you want to see if the fault here lies in the bounce buffer > > > being used more > > > >often in the DomU b/c you have 8GB of memory now and you end up using > > > more pages > > > >past 4GB (in DomU), I can cook up a patch to figure this out. But an > > > easier way is > > > >to just do (on the Xen hypervisor line): mem=4G and that will make > > > think you only have > > > >4GB of physical RAM. ??If the load comes back to the normal "amount" > > > then the likely > > > >culprit is that and we can think on how to fix this. > > > > You are on the right track. Load was going down to "normal" 10% when reducing > > Xen to 4GB by the parameter. Load seems to be still a little, little bit lower > > with Xenified Kernel (8-9%), but this is drastically lower than the 20% we had > > before. > > > _______________________________________________ > > Xen-devel mailing list > > Xen-devel@lists.xensource.com > > http://lists.xensource.com/xen-devel > >
On Mon, 2011-11-28 at 15:28 +0000, Konrad Rzeszutek Wilk wrote:> On Fri, Nov 25, 2011 at 11:11:55PM +0100, Carsten Schiers wrote:> > I looked through my old mails from you and you explained already the necessity of double > > bounce buffering (PCI->below 4GB->above 4GB). What I don''t understand is: why does the > > Xenified kernel not have this kind of issue? > > That is a puzzle. It should not. The code is very much the same - both > use the generic SWIOTLB which has not changed for years.The swiotlb-xen used by classic-xen kernels (which I assume is what Carsten means by "Xenified") isn''t exactly the same as the stuff in mainline Linux, it''s been heavily refactored for one thing. It''s not impossible that mainline is bouncing something it doesn''t really need to. It''s also possible that the dma mask of the device is different/wrong in mainline leading to such additional bouncing. I guess it''s also possible that the classic-Xen kernels are playing fast and loose by not bouncing something they should (although if so they appear to be getting away with it...) or that there is some difference which really means mainline needs to bounce while classic-Xen doesn''t. Ian.
Hi, let me try to explain a bit more. Here you see the output of my xentop munin graph for a week. Only take a look at the bluish buckle. Notice the small step in front? So it''s the CPU permille used by the DomU that owns the cards. The small buckle is when I only put in one PCI card. Afterwards it''s constantly noticable higher load. See that Dom0 (green) is not impacted. I am back to the Xenified kernel, as you can see. In the next picture you see the output of xenpm visualized. So this might be an indicator that realy something happens. It''s only the core that I dedicated to that DomU. I have a three-core AMD CPU by the way: In CPU usage of the Dom0, there is nothing to see: In CPU usage of the DomU, there is also not much to see, eventually a very slight change of mix: There is a slight increase in sleaping jobs at the time slot in question, I guess nothing we ca directly map to the issue: If you need other charts, I can try to produce them. BR, Carsten. -----Ursprüngliche Nachricht----- An:Carsten Schiers <carsten@schiers.de>; zhenzhong.duan@oracle.com; lersek@redhat.com; CC:xen-devel <xen-devel@lists.xensource.com>; konrad.wilk <konrad.wilk@oracle.com>; Von:Konrad Rzeszutek Wilk <konrad@darnok.org> Gesendet:Mo 28.11.2011 16:33 Betreff:Re: [Xen-devel] Load increase after memory upgrade (part2) On Fri, Nov 25, 2011 at 11:11:55PM +0100, Carsten Schiers wrote:> I got the values in DomU. I will have > > - aprox. 5% load in DomU with 2.6.34 Xenified Kernel > - aprox. 15% load in DomU with 2.6.32.46 Jeremy or 3.1.2 Kernel with one card attached > - aprox. 30% load in DomU with 2.6.32.46 Jeremy or 3.1.2 Kernel with two cards attachedHA! I just wonder if the issue is that the reporting of CPU spent is wrong. Laszlo Ersek and Zhenzhong Duan have both reported a bug in the pvops code when it came to account of CPU time.> > I looked through my old mails from you and you explained already the necessity of double > bounce buffering (PCI->below 4GB->above 4GB). What I don''t understand is: why does the > Xenified kernel not have this kind of issue?That is a puzzle. It should not. The code is very much the same - both use the generic SWIOTLB which has not changed for years.> > The driver in question is nearly identical between the two kernel versions. It is in > Drivers/media/dvb/ttpci by the way and when I understood the code right, the allo in > question is: > > /* allocate and init buffers */ > av7110->debi_virt = pci_alloc_consistent(pdev, 8192, &av7110->debi_bus);Good. So it allocates it during init and uses it.> if (!av7110->debi_virt) > goto err_saa71466_vfree_4; > > isn''t it? I think the cards are constantly transferring the stream received through DMA.Yeah, and that memory is set aside for the life of the driver. So there should be no bounce buffering happening (as it allocated the memory below the 4GB mark).> > I have set dom0_mem=512M by the way, shall I change that in some way?Does the reporting (CPU usage of DomU) change in any way with that?> > I can try out some things, if you want me to. But I have no idea what to do and where to > start, so I rely on your help... > > Carsten. > > -----Urspr?ngliche Nachricht----- > Von: xen-devel-bounces@lists.xensource.com [mailto:xen-devel-bounces@lists.xensource.com] Im Auftrag von Konrad Rzeszutek Wilk > Gesendet: Freitag, 25. November 2011 19:43 > An: Carsten Schiers > Cc: xen-devel; konrad.wilk > Betreff: Re: [Xen-devel] Load increase after memory upgrade (part2) > > On Thu, Nov 24, 2011 at 01:28:44PM +0100, Carsten Schiers wrote: > > Hello again, I would like to come back to that thing...sorry that I did not have the time up to now. > > > > ?? > > We (now) speak about > > > > ?? > > *Xen 4.1.2 > > *Dom0 is Jeremy''s 2.6.32.46 64 bit > > *DomU in question is now 3.1.2 64 bit > > *Same thing if DomU is also 2.6.32.46 > > *DomU owns two PCI cards (DVB-C) that o DMA > > *Machine has 8GB, Dom0 pinned at 512MB > > > > ?? > > As compared to 2.6.34 Kernel with backported patches, the load on the DomU is at least twice as high. It > > > > will be "close to normal" if I reduce the memory used to 4GB. > > That is in the dom0 or just in general on the machine? > > > > ?? > > As you can see from the attachment, you once had an idea. So should we try to find something...? > > I think that was to instrument swiotlb to give an idea of how > often it is called and basically have a matrix of its load. And > from there figure out if the issue is that: > > 1). The drivers allocoate/bounce/deallocate buffers on every interrupt > (bad, driver should be using some form of dma pool and most of the > ivtv do that) > > 2). The buffers allocated to the drivers are above the 4GB and we end > up bouncing it needlessly. That can happen if the dom0 has most of > the precious memory under 4GB. However, that is usually not the case > as the domain isusually allocated from the top of the memory. The > fix for that was to set dom0_mem=max:XX. .. but with Dom0 kernels > before 3.1, the parameter would be ignored, so you had to use > ''mem=XX'' on the Linux command line as well. > > 3). Where did you get the load values? Was it dom0? or domU? > > > > > > > ?? > > Carsten. > > ?? > > -----Urspr??ngliche Nachricht----- > > An:konrad.wilk <konrad.wilk@oracle.com>; > > CC:linux <linux@eikelenboom.it>; xen-devel <xen-devel@lists.xensource.com>; > > Von:Carsten Schiers <carsten@schiers.de> > > Gesendet:Mi 29.06.2011 23:17 > > Betreff:AW: Re: Re: Re: AW: Re: [Xen-devel] AW: Load increase after memory upgrade? > > > Lets first do the c) experiment as that will likely explain your load average increase. > > ... > > > >c). If you want to see if the fault here lies in the bounce buffer > > > being used more > > > >often in the DomU b/c you have 8GB of memory now and you end up using > > > more pages > > > >past 4GB (in DomU), I can cook up a patch to figure this out. But an > > > easier way is > > > >to just do (on the Xen hypervisor line): mem=4G and that will make > > > think you only have > > > >4GB of physical RAM. ??If the load comes back to the normal "amount" > > > then the likely > > > >culprit is that and we can think on how to fix this. > > > > You are on the right track. Load was going down to "normal" 10% when reducing > > Xen to 4GB by the parameter. Load seems to be still a little, little bit lower > > with Xenified Kernel (8-9%), but this is drastically lower than the 20% we had > > before. > > > _______________________________________________ > > Xen-devel mailing list > > Xen-devel@lists.xensource.com > > http://lists.xensource.com/xen-devel > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2011-Nov-28 16:45 UTC
Re: Load increase after memory upgrade (part2)
On Mon, Nov 28, 2011 at 03:40:13PM +0000, Ian Campbell wrote:> On Mon, 2011-11-28 at 15:28 +0000, Konrad Rzeszutek Wilk wrote: > > On Fri, Nov 25, 2011 at 11:11:55PM +0100, Carsten Schiers wrote: > > > > I looked through my old mails from you and you explained already the necessity of double > > > bounce buffering (PCI->below 4GB->above 4GB). What I don''t understand is: why does the > > > Xenified kernel not have this kind of issue? > > > > That is a puzzle. It should not. The code is very much the same - both > > use the generic SWIOTLB which has not changed for years. > > The swiotlb-xen used by classic-xen kernels (which I assume is what > Carsten means by "Xenified") isn''t exactly the same as the stuff in > mainline Linux, it''s been heavily refactored for one thing. It''s not > impossible that mainline is bouncing something it doesn''t really need > to.The usage, at least with ''pci_alloc_coherent'' is that there is no bouncing being done. The alloc_coherent will allocate a nice page, underneath the 4GB mark and give it to the driver. The driver can use it as it wishes and there is no need to bounce buffer. But I can''t find the implementation of that in the classic Xen-SWIOTLB. It looks as if it is using map_single which would be taking the memory out of the pool for a very long time, instead of allocating memory and "swizzling" the MFNs. [Note, I looked at the 2.6.18 hg tree for classic, the 2.6.34 is probably improved much better so let me check that] Carsten, let me prep up a patch that will print some diagnostic information during the runtime - to see how often it does the bounce, the usage, etc..> > It''s also possible that the dma mask of the device is different/wrong in > mainline leading to such additional bouncing.If one were to use map_page and such - yes. But the alloc_coherent bypasses that and ends up allocating it right under the 4GB (or rather it allocates based on the dev->coherent_mask and swizzles the MFNs as required).> > I guess it''s also possible that the classic-Xen kernels are playing fast > and loose by not bouncing something they should (although if so they > appear to be getting away with it...) or that there is some difference > which really means mainline needs to bounce while classic-Xen doesn''t.<nods> Could be very well.> > Ian. >
On 11/28/11 16:40, Ian Campbell wrote:> On Mon, 2011-11-28 at 15:28 +0000, Konrad Rzeszutek Wilk wrote: >> On Fri, Nov 25, 2011 at 11:11:55PM +0100, Carsten Schiers wrote: > >>> I looked through my old mails from you and you explained already the necessity of double >>> bounce buffering (PCI->below 4GB->above 4GB). What I don''t understand is: why does the >>> Xenified kernel not have this kind of issue? >> >> That is a puzzle. It should not. The code is very much the same - both >> use the generic SWIOTLB which has not changed for years. > > The swiotlb-xen used by classic-xen kernels (which I assume is what > Carsten means by "Xenified") isn''t exactly the same as the stuff in > mainline Linux, it''s been heavily refactored for one thing. It''s not > impossible that mainline is bouncing something it doesn''t really need > to.Please excuse me if I''m completely mistaken; my only point of reference is that we recently had to backport <http://xenbits.xensource.com/hg/linux-2.6.18-xen.hg/rev/940>.> It''s also possible that the dma mask of the device is different/wrong in > mainline leading to such additional bouncing.dma_alloc_coherent() -- which I guess is the precursor of pci_alloc_consistent() -- asks xen_create_contiguous_region() to back the vaddr range with frames machine-addressible inside the device''s dma mask. xen_create_contiguous_region() seems to land in a XENMEM_exchange hypercall (among others). Perhaps this extra layer of indirection allows the driver to use low pages directly, without bounce buffers.> I guess it''s also possible that the classic-Xen kernels are playing fast > and loose by not bouncing something they should (although if so they > appear to be getting away with it...) or that there is some difference > which really means mainline needs to bounce while classic-Xen doesn''t.I''m sorry if what I just posted is painfully stupid. I''m taking the risk for the 1% chance that it could be helpful. Wrt. the idle time accounting problem, after Niall''s two pings, I''m also waiting for a verdict, and/or for myself finding the time and fishing out the current patches. Laszlo
>>> On 28.11.11 at 17:45, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote: > But I can''t find the implementation of that in the classic Xen-SWIOTLB.linux-2.6.18-xen.hg/arch/i386/kernel/pci-dma-xen.c:dma_alloc_coherent(). Jan
I attached the actualy used 2.6.34 file here, if that helps. BR,C. -----Ursprüngliche Nachricht----- An:Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>; CC:Ian Campbell <Ian.Campbell@citrix.com>; Konrad Rzeszutek Wilk <konrad@darnok.org>; xen-devel <xen-devel@lists.xensource.com>; zhenzhong.duan@oracle.com; lersek@redhat.com; Carsten Schiers <carsten@schiers.de>; Von:Jan Beulich <JBeulich@suse.com> Gesendet:Di 29.11.2011 09:52 Betreff:Re: [Xen-devel] Load increase after memory upgrade (part2)>>> On 28.11.11 at 17:45, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote: > But I can''t find the implementation of that in the classic Xen-SWIOTLB.linux-2.6.18-xen.hg/arch/i386/kernel/pci-dma-xen.c:dma_alloc_coherent(). Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
The swiotlb-xen used by classic-xen kernels (which I assume is what Carsten means by "Xenified") isn''t exactly the same as the stuff in mainline Linux, it''s been heavily refactored for one thing. It''s not impossible that mainline is bouncing something it doesn''t really need to. Yes, it''s a 2.6.34 kernel with Andrew Lyon''s backported patches found here: http://code.google.com/p/gentoo-xen-kernel/downloads/list GrC. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> - with load I mean the %CPU of xentop > - there is no change in CPU usage of the DomU or Dom0Uhh, which matrix are using for that? CPU usage...? This is if you change the DomU or the amount of memory the guest has? This is not the load number (xentop value)? I had a quick look into the munin plugin. It reads the output of "xm li", Time in seconds and normalizes it. But the effect is also visible in the CPU(%) column of xentop, if the DomU is on higher load. BR,C. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Carsten, let me prep up a patch that will print some diagnostic information during the runtime - to see how often it does the bounce, the usage, etc.. Jup, looking forward to implementing it. I can include them into any kernel. 2.6.18 would be a bit difficult though, as the driver pack isn''t compatible any longer...so I''d prefer 2.6.34 Xenified vs. 3.1.2 pvops. BR,C. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Mon, 2011-11-28 at 16:45 +0000, Konrad Rzeszutek Wilk wrote:> On Mon, Nov 28, 2011 at 03:40:13PM +0000, Ian Campbell wrote: > > On Mon, 2011-11-28 at 15:28 +0000, Konrad Rzeszutek Wilk wrote: > > > On Fri, Nov 25, 2011 at 11:11:55PM +0100, Carsten Schiers wrote: > > > > > > I looked through my old mails from you and you explained already the necessity of double > > > > bounce buffering (PCI->below 4GB->above 4GB). What I don''t understand is: why does the > > > > Xenified kernel not have this kind of issue? > > > > > > That is a puzzle. It should not. The code is very much the same - both > > > use the generic SWIOTLB which has not changed for years. > > > > The swiotlb-xen used by classic-xen kernels (which I assume is what > > Carsten means by "Xenified") isn''t exactly the same as the stuff in > > mainline Linux, it''s been heavily refactored for one thing. It''s not > > impossible that mainline is bouncing something it doesn''t really need > > to. > > The usage, at least with ''pci_alloc_coherent'' is that there is no bouncing > being done. The alloc_coherent will allocate a nice page, underneath the 4GB > mark and give it to the driver. The driver can use it as it wishes and there > is no need to bounce buffer.Oh, I didn''t realise dma_alloc_coherent was part of swiotlb now. Only a subset of swiotlb is in use then, all the bouncing stuff _should_ be idle/unused -- but has that been confirmed?> > But I can''t find the implementation of that in the classic Xen-SWIOTLB. It looks > as if it is using map_single which would be taking the memory out of the > pool for a very long time, instead of allocating memory and "swizzling" the MFNs. > [Note, I looked at the 2.6.18 hg tree for classic, the 2.6.34 is probably > improved much better so let me check that] > > Carsten, let me prep up a patch that will print some diagnostic information > during the runtime - to see how often it does the bounce, the usage, etc.. > > > > > It''s also possible that the dma mask of the device is different/wrong in > > mainline leading to such additional bouncing. > > If one were to use map_page and such - yes. But the alloc_coherent bypasses > that and ends up allocating it right under the 4GB (or rather it allocates > based on the dev->coherent_mask and swizzles the MFNs as required). > > > > > I guess it''s also possible that the classic-Xen kernels are playing fast > > and loose by not bouncing something they should (although if so they > > appear to be getting away with it...) or that there is some difference > > which really means mainline needs to bounce while classic-Xen doesn''t. > > <nods> Could be very well. > > > > Ian. > >
Konrad Rzeszutek Wilk
2011-Nov-29 15:33 UTC
Re: Load increase after memory upgrade (part2)
On Tue, Nov 29, 2011 at 10:23:18AM +0000, Ian Campbell wrote:> On Mon, 2011-11-28 at 16:45 +0000, Konrad Rzeszutek Wilk wrote: > > On Mon, Nov 28, 2011 at 03:40:13PM +0000, Ian Campbell wrote: > > > On Mon, 2011-11-28 at 15:28 +0000, Konrad Rzeszutek Wilk wrote: > > > > On Fri, Nov 25, 2011 at 11:11:55PM +0100, Carsten Schiers wrote: > > > > > > > > I looked through my old mails from you and you explained already the necessity of double > > > > > bounce buffering (PCI->below 4GB->above 4GB). What I don''t understand is: why does the > > > > > Xenified kernel not have this kind of issue? > > > > > > > > That is a puzzle. It should not. The code is very much the same - both > > > > use the generic SWIOTLB which has not changed for years. > > > > > > The swiotlb-xen used by classic-xen kernels (which I assume is what > > > Carsten means by "Xenified") isn''t exactly the same as the stuff in > > > mainline Linux, it''s been heavily refactored for one thing. It''s not > > > impossible that mainline is bouncing something it doesn''t really need > > > to. > > > > The usage, at least with ''pci_alloc_coherent'' is that there is no bouncing > > being done. The alloc_coherent will allocate a nice page, underneath the 4GB > > mark and give it to the driver. The driver can use it as it wishes and there > > is no need to bounce buffer. > > Oh, I didn''t realise dma_alloc_coherent was part of swiotlb now. Only a > subset of swiotlb is in use then, all the bouncing stuff _should_ be > idle/unused -- but has that been confirmed?Nope. I hope that the diagnostic patch I have in mind will prove/disprove that. Now I just need to find a moment to write it :-)
Konrad Rzeszutek Wilk
2011-Dec-02 15:23 UTC
Re: Load increase after memory upgrade (part2)
> > > > > That is a puzzle. It should not. The code is very much the same - both > > > > > use the generic SWIOTLB which has not changed for years. > > > > > > > > The swiotlb-xen used by classic-xen kernels (which I assume is what > > > > Carsten means by "Xenified") isn''t exactly the same as the stuff in > > > > mainline Linux, it''s been heavily refactored for one thing. It''s not > > > > impossible that mainline is bouncing something it doesn''t really need > > > > to. > > > > > > The usage, at least with ''pci_alloc_coherent'' is that there is no bouncing > > > being done. The alloc_coherent will allocate a nice page, underneath the 4GB > > > mark and give it to the driver. The driver can use it as it wishes and there > > > is no need to bounce buffer. > > > > Oh, I didn''t realise dma_alloc_coherent was part of swiotlb now. Only a > > subset of swiotlb is in use then, all the bouncing stuff _should_ be > > idle/unused -- but has that been confirmed? > > Nope. I hope that the diagnostic patch I have in mind will prove/disprove that. > Now I just need to find a moment to write it :-)Done! Carsten, can you please patch your kernel with this hacky patch and when you have booted the new kernel, just do modprobe dump_swiotlb it should give an idea of how many bounces are happening, coherent allocations, syncs, and so on.. along with the last driver that did those operations. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Thank you, Konrad. I applied the patch to 3.1.2. In order to have a clear picture, I only enabled one PCI card. The result is: [ 28.028032] Starting SWIOTLB debug thread. [ 28.028076] swiotlb_start_thread: Go! [ 28.028622] xen_swiotlb_start_thread: Go! [ 33.028153] 0 [budget_av 0000:00:00.0] bounce: from:555352(slow:0)to:0 map:329 unmap:0 sync:555352 [ 33.028294] SWIOTLB is 2% full [ 38.028178] 0 budget_av 0000:00:00.0 alloc coherent: 4, free: 0 [ 38.028230] 0 [budget_av 0000:00:00.0] bounce: from:127981(slow:0)to:0 map:0 unmap:0 sync:127981 [ 38.028352] SWIOTLB is 2% full [ 43.028170] 0 [budget_av 0000:00:00.0] bounce: from:128310(slow:0)to:0 map:0 unmap:0 sync:128310 [ 43.028310] SWIOTLB is 2% full [ 48.028199] 0 [budget_av 0000:00:00.0] bounce: from:127981(slow:0)to:0 map:0 unmap:0 sync:127981 [ 48.028334] SWIOTLB is 2% full [ 53.028170] 0 [budget_av 0000:00:00.0] bounce: from:128310(slow:0)to:0 map:0 unmap:0 sync:128310 [ 53.028309] SWIOTLB is 2% full [ 58.028138] 0 [budget_av 0000:00:00.0] bounce: from:126994(slow:0)to:0 map:0 unmap:0 sync:126994 [ 58.028195] SWIOTLB is 2% full [ 63.028170] 0 [budget_av 0000:00:00.0] bounce: from:121401(slow:0)to:0 map:0 unmap:0 sync:121401 [ 63.029560] SWIOTLB is 2% full [ 68.028193] 0 [budget_av 0000:00:00.0] bounce: from:127981(slow:0)to:0 map:0 unmap:0 sync:127981 [ 68.028329] SWIOTLB is 2% full [ 73.028104] 0 [budget_av 0000:00:00.0] bounce: from:122717(slow:0)to:0 map:0 unmap:0 sync:122717 [ 73.028244] SWIOTLB is 2% full [ 78.028191] 0 [budget_av 0000:00:00.0] bounce: from:127981(slow:0)to:0 map:0 unmap:0 sync:127981 [ 78.028331] SWIOTLB is 2% full [ 83.028112] 0 [budget_av 0000:00:00.0] bounce: from:128310(slow:0)to:0 map:0 unmap:0 sync:128310 [ 83.028171] SWIOTLB is 2% full Was that long enough? I hope this helps. Carsten. -----Ursprüngliche Nachricht----- Von: Konrad Rzeszutek Wilk [mailto:konrad@darnok.org] Gesendet: Freitag, 2. Dezember 2011 16:24 An: Konrad Rzeszutek Wilk Cc: Ian Campbell; xen-devel; Carsten Schiers; zhenzhong.duan@oracle.com; lersek@redhat.com Betreff: Re: [Xen-devel] Load increase after memory upgrade (part2)> > > > > That is a puzzle. It should not. The code is very much the same - both > > > > > use the generic SWIOTLB which has not changed for years. > > > > > > > > The swiotlb-xen used by classic-xen kernels (which I assume is what > > > > Carsten means by "Xenified") isn''t exactly the same as the stuff in > > > > mainline Linux, it''s been heavily refactored for one thing. It''s not > > > > impossible that mainline is bouncing something it doesn''t really need > > > > to. > > > > > > The usage, at least with ''pci_alloc_coherent'' is that there is no bouncing > > > being done. The alloc_coherent will allocate a nice page, underneath the 4GB > > > mark and give it to the driver. The driver can use it as it wishes and there > > > is no need to bounce buffer. > > > > Oh, I didn''t realise dma_alloc_coherent was part of swiotlb now. Only a > > subset of swiotlb is in use then, all the bouncing stuff _should_ be > > idle/unused -- but has that been confirmed? > > Nope. I hope that the diagnostic patch I have in mind will prove/disprove that. > Now I just need to find a moment to write it :-)Done! Carsten, can you please patch your kernel with this hacky patch and when you have booted the new kernel, just do modprobe dump_swiotlb it should give an idea of how many bounces are happening, coherent allocations, syncs, and so on.. along with the last driver that did those operations.
Here with two cards enabled and creating a bit "work" by watching TV with one oft hem: [ 23.842720] Starting SWIOTLB debug thread. [ 23.842750] swiotlb_start_thread: Go! [ 23.842838] xen_swiotlb_start_thread: Go! [ 28.841451] 0 [budget_av 0000:00:01.0] bounce: from:435596(slow:0)to:0 map:658 unmap:0 sync:435596 [ 28.841592] SWIOTLB is 4% full [ 33.840147] 0 [budget_av 0000:00:01.0] bounce: from:127652(slow:0)to:0 map:0 unmap:0 sync:127652 [ 33.840283] SWIOTLB is 4% full [ 33.844222] 0 budget_av 0000:00:01.0 alloc coherent: 8, free: 0 [ 38.840227] 0 [budget_av 0000:00:01.0] bounce: from:128310(slow:0)to:0 map:0 unmap:0 sync:128310 [ 38.840361] SWIOTLB is 4% full [ 43.840182] 0 [budget_av 0000:00:01.0] bounce: from:128310(slow:0)to:0 map:0 unmap:0 sync:128310 [ 43.840323] SWIOTLB is 4% full [ 48.840094] 0 [budget_av 0000:00:01.0] bounce: from:127652(slow:0)to:0 map:0 unmap:0 sync:127652 [ 48.840154] SWIOTLB is 4% full [ 53.840160] 0 [budget_av 0000:00:01.0] bounce: from:119756(slow:0)to:0 map:0 unmap:0 sync:119756 [ 53.840301] SWIOTLB is 4% full [ 58.840202] 0 [budget_av 0000:00:01.0] bounce: from:128310(slow:0)to:0 map:0 unmap:0 sync:128310 [ 58.840339] SWIOTLB is 4% full [ 63.840626] 0 [budget_av 0000:00:01.0] bounce: from:128310(slow:0)to:0 map:0 unmap:0 sync:128310 [ 63.840686] SWIOTLB is 4% full [ 68.840122] 0 [budget_av 0000:00:01.0] bounce: from:127323(slow:0)to:0 map:0 unmap:0 sync:127323 [ 68.840180] SWIOTLB is 4% full [ 73.840647] 0 [budget_av 0000:00:01.0] bounce: from:211547(slow:0)to:0 map:0 unmap:0 sync:211547 [ 73.840784] SWIOTLB is 4% full [ 78.840204] 0 [budget_av 0000:00:01.0] bounce: from:255962(slow:0)to:0 map:0 unmap:0 sync:255962 [ 78.840344] SWIOTLB is 4% full [ 83.840114] 0 [budget_av 0000:00:01.0] bounce: from:255304(slow:0)to:0 map:0 unmap:0 sync:255304 [ 83.840178] SWIOTLB is 4% full [ 88.840158] 0 [budget_av 0000:00:01.0] bounce: from:256620(slow:0)to:0 map:0 unmap:0 sync:256620 [ 88.840302] SWIOTLB is 4% full [ 93.840185] 0 [budget_av 0000:00:00.0] bounce: from:250040(slow:0)to:0 map:0 unmap:0 sync:250040 [ 93.840319] SWIOTLB is 4% full [ 98.840181] 0 [budget_av 0000:00:00.0] bounce: from:255962(slow:0)to:0 map:0 unmap:0 sync:255962 [ 98.841563] SWIOTLB is 4% full [ 103.841221] 0 [budget_av 0000:00:00.0] bounce: from:255962(slow:0)to:0 map:0 unmap:0 sync:255962 [ 103.841361] SWIOTLB is 4% full [ 108.840247] 0 [budget_av 0000:00:00.0] bounce: from:255962(slow:0)to:0 map:0 unmap:0 sync:255962 [ 108.840389] SWIOTLB is 4% full [ 113.840157] 0 [budget_av 0000:00:00.0] bounce: from:261555(slow:0)to:0 map:0 unmap:0 sync:261555 [ 113.840298] SWIOTLB is 4% full [ 118.840119] 0 [budget_av 0000:00:00.0] bounce: from:295442(slow:0)to:0 map:0 unmap:0 sync:295442 [ 118.840259] SWIOTLB is 4% full [ 123.841025] 0 [budget_av 0000:00:00.0] bounce: from:295113(slow:0)to:0 map:0 unmap:0 sync:295113 [ 123.841164] SWIOTLB is 4% full [ 128.840175] 0 [budget_av 0000:00:00.0] bounce: from:294784(slow:0)to:0 map:0 unmap:0 sync:294784 [ 128.840310] SWIOTLB is 4% full [ 133.840194] 0 [budget_av 0000:00:00.0] bounce: from:293797(slow:0)to:0 map:0 unmap:0 sync:293797 [ 133.840330] SWIOTLB is 4% full [ 138.840498] 0 [budget_av 0000:00:00.0] bounce: from:341502(slow:0)to:0 map:0 unmap:0 sync:341502 [ 138.840637] SWIOTLB is 4% full [ 143.840173] 0 [budget_av 0000:00:00.0] bounce: from:341502(slow:0)to:0 map:0 unmap:0 sync:341502 [ 143.840313] SWIOTLB is 4% full [ 148.840215] 0 [budget_av 0000:00:00.0] bounce: from:341831(slow:0)to:0 map:0 unmap:0 sync:341831 [ 148.840355] SWIOTLB is 4% full [ 153.840205] 0 [budget_av 0000:00:01.0] bounce: from:329658(slow:0)to:0 map:0 unmap:0 sync:329658 [ 153.840341] SWIOTLB is 4% full [ 158.840137] 0 [budget_av 0000:00:00.0] bounce: from:342160(slow:0)to:0 map:0 unmap:0 sync:342160 [ 158.840277] SWIOTLB is 4% full [ 163.841288] 0 [budget_av 0000:00:00.0] bounce: from:341502(slow:0)to:0 map:0 unmap:0 sync:341502 [ 163.841424] SWIOTLB is 4% full [ 168.840198] 0 [budget_av 0000:00:00.0] bounce: from:341502(slow:0)to:0 map:0 unmap:0 sync:341502 [ 168.840339] SWIOTLB is 4% full [ 173.840167] 0 [budget_av 0000:00:00.0] bounce: from:341502(slow:0)to:0 map:0 unmap:0 sync:341502 [ 173.840304] SWIOTLB is 4% full [ 178.840184] 0 [budget_av 0000:00:00.0] bounce: from:328013(slow:0)to:0 map:0 unmap:0 sync:328013 [ 178.840324] SWIOTLB is 4% full [ 183.840129] 0 [budget_av 0000:00:00.0] bounce: from:341831(slow:0)to:0 map:0 unmap:0 sync:341831 [ 183.840269] SWIOTLB is 4% full [ 188.840123] 0 [budget_av 0000:00:01.0] bounce: from:340515(slow:0)to:0 map:0 unmap:0 sync:340515 [ 188.841647] SWIOTLB is 4% full [ 193.840192] 0 [budget_av 0000:00:00.0] bounce: from:338541(slow:0)to:0 map:0 unmap:0 sync:338541 [ 193.840329] SWIOTLB is 4% full [ 198.840148] 0 [budget_av 0000:00:01.0] bounce: from:330316(slow:0)to:0 map:0 unmap:0 sync:330316 [ 198.840230] SWIOTLB is 4% full [ 203.840860] 0 [budget_av 0000:00:00.0] bounce: from:341831(slow:0)to:0 map:0 unmap:0 sync:341831 [ 203.841000] SWIOTLB is 4% full [ 208.840562] 0 [budget_av 0000:00:01.0] bounce: from:337883(slow:0)to:0 map:0 unmap:0 sync:337883 [ 208.840698] SWIOTLB is 4% full [ 213.840171] 0 [budget_av 0000:00:00.0] bounce: from:341502(slow:0)to:0 map:0 unmap:0 sync:341502 [ 213.840311] SWIOTLB is 4% full [ 218.840214] 0 [budget_av 0000:00:01.0] bounce: from:320117(slow:0)to:0 map:0 unmap:0 sync:320117 [ 218.840354] SWIOTLB is 4% full [ 223.840238] 0 [budget_av 0000:00:01.0] bounce: from:299390(slow:0)to:0 map:0 unmap:0 sync:299390 [ 223.840373] SWIOTLB is 4% full [ 228.841415] 0 [budget_av 0000:00:01.0] bounce: from:298732(slow:0)to:0 map:0 unmap:0 sync:298732 [ 228.841560] SWIOTLB is 4% full [ 233.840705] 0 [budget_av 0000:00:00.0] bounce: from:299061(slow:0)to:0 map:0 unmap:0 sync:299061 [ 233.840844] SWIOTLB is 4% full [ 238.840145] 0 [budget_av 0000:00:01.0] bounce: from:293468(slow:0)to:0 map:0 unmap:0 sync:293468 [ 238.840280] SWIOTLB is 4% full -----Ursprüngliche Nachricht----- Von: Konrad Rzeszutek Wilk [mailto:konrad@darnok.org] Gesendet: Freitag, 2. Dezember 2011 16:24 An: Konrad Rzeszutek Wilk Cc: Ian Campbell; xen-devel; Carsten Schiers; zhenzhong.duan@oracle.com; lersek@redhat.com Betreff: Re: [Xen-devel] Load increase after memory upgrade (part2)> > > > > That is a puzzle. It should not. The code is very much the same - both > > > > > use the generic SWIOTLB which has not changed for years. > > > > > > > > The swiotlb-xen used by classic-xen kernels (which I assume is what > > > > Carsten means by "Xenified") isn''t exactly the same as the stuff in > > > > mainline Linux, it''s been heavily refactored for one thing. It''s not > > > > impossible that mainline is bouncing something it doesn''t really need > > > > to. > > > > > > The usage, at least with ''pci_alloc_coherent'' is that there is no bouncing > > > being done. The alloc_coherent will allocate a nice page, underneath the 4GB > > > mark and give it to the driver. The driver can use it as it wishes and there > > > is no need to bounce buffer. > > > > Oh, I didn''t realise dma_alloc_coherent was part of swiotlb now. Only a > > subset of swiotlb is in use then, all the bouncing stuff _should_ be > > idle/unused -- but has that been confirmed? > > Nope. I hope that the diagnostic patch I have in mind will prove/disprove that. > Now I just need to find a moment to write it :-)Done! Carsten, can you please patch your kernel with this hacky patch and when you have booted the new kernel, just do modprobe dump_swiotlb it should give an idea of how many bounces are happening, coherent allocations, syncs, and so on.. along with the last driver that did those operations.
Should eventually mention that I create the DomU with only the parameter iommu=soft. I hope Nothing more is required. For Xenified, it''s swiotlb=32,force. Carsten. -----Ursprüngliche Nachricht----- Von: Konrad Rzeszutek Wilk [mailto:konrad@darnok.org] Gesendet: Freitag, 2. Dezember 2011 16:24 An: Konrad Rzeszutek Wilk Cc: Ian Campbell; xen-devel; Carsten Schiers; zhenzhong.duan@oracle.com; lersek@redhat.com Betreff: Re: [Xen-devel] Load increase after memory upgrade (part2)> > > > > That is a puzzle. It should not. The code is very much the same - both > > > > > use the generic SWIOTLB which has not changed for years. > > > > > > > > The swiotlb-xen used by classic-xen kernels (which I assume is what > > > > Carsten means by "Xenified") isn''t exactly the same as the stuff in > > > > mainline Linux, it''s been heavily refactored for one thing. It''s not > > > > impossible that mainline is bouncing something it doesn''t really need > > > > to. > > > > > > The usage, at least with ''pci_alloc_coherent'' is that there is no bouncing > > > being done. The alloc_coherent will allocate a nice page, underneath the 4GB > > > mark and give it to the driver. The driver can use it as it wishes and there > > > is no need to bounce buffer. > > > > Oh, I didn''t realise dma_alloc_coherent was part of swiotlb now. Only a > > subset of swiotlb is in use then, all the bouncing stuff _should_ be > > idle/unused -- but has that been confirmed? > > Nope. I hope that the diagnostic patch I have in mind will prove/disprove that. > Now I just need to find a moment to write it :-)Done! Carsten, can you please patch your kernel with this hacky patch and when you have booted the new kernel, just do modprobe dump_swiotlb it should give an idea of how many bounces are happening, coherent allocations, syncs, and so on.. along with the last driver that did those operations.
Konrad Rzeszutek Wilk
2011-Dec-06 03:26 UTC
Re: Load increase after memory upgrade (part2)
On Sun, Dec 04, 2011 at 01:09:28PM +0100, Carsten Schiers wrote:> Here with two cards enabled and creating a bit "work" by watching TV with one oft hem: > > [ 23.842720] Starting SWIOTLB debug thread. > [ 23.842750] swiotlb_start_thread: Go! > [ 23.842838] xen_swiotlb_start_thread: Go! > [ 28.841451] 0 [budget_av 0000:00:01.0] bounce: from:435596(slow:0)to:0 map:658 unmap:0 sync:435596 > [ 28.841592] SWIOTLB is 4% full > [ 33.840147] 0 [budget_av 0000:00:01.0] bounce: from:127652(slow:0)to:0 map:0 unmap:0 sync:127652 > [ 33.840283] SWIOTLB is 4% full > [ 33.844222] 0 budget_av 0000:00:01.0 alloc coherent: 8, free: 0 > [ 38.840227] 0 [budget_av 0000:00:01.0] bounce: from:128310(slow:0)to:0 map:0 unmap:0 sync:128310Whoa. Yes. You are definitly using the bounce buffer :-) Now it is time to look at why the drive is not using those coherent ones - it looks to allocate just eight of them but does not use them.. Unless it is using them _and_ bouncing them (which would be odd). And BTW, you can lower your ''swiotlb=XX'' value. The 4% is how much you are using of the default size. I should find out_why_ the old Xen kernels do not use the bounce buffer so much...
Konrad Rzeszutek Wilk
2011-Dec-14 20:23 UTC
Re: Load increase after memory upgrade (part2)
On Mon, Dec 05, 2011 at 10:26:21PM -0500, Konrad Rzeszutek Wilk wrote:> On Sun, Dec 04, 2011 at 01:09:28PM +0100, Carsten Schiers wrote: > > Here with two cards enabled and creating a bit "work" by watching TV with one oft hem: > > > > [ 23.842720] Starting SWIOTLB debug thread. > > [ 23.842750] swiotlb_start_thread: Go! > > [ 23.842838] xen_swiotlb_start_thread: Go! > > [ 28.841451] 0 [budget_av 0000:00:01.0] bounce: from:435596(slow:0)to:0 map:658 unmap:0 sync:435596 > > [ 28.841592] SWIOTLB is 4% full > > [ 33.840147] 0 [budget_av 0000:00:01.0] bounce: from:127652(slow:0)to:0 map:0 unmap:0 sync:127652 > > [ 33.840283] SWIOTLB is 4% full > > [ 33.844222] 0 budget_av 0000:00:01.0 alloc coherent: 8, free: 0 > > [ 38.840227] 0 [budget_av 0000:00:01.0] bounce: from:128310(slow:0)to:0 map:0 unmap:0 sync:128310 > > Whoa. Yes. You are definitly using the bounce buffer :-) > > Now it is time to look at why the drive is not using those coherent ones - it > looks to allocate just eight of them but does not use them.. Unless it is > using them _and_ bouncing them (which would be odd). > > And BTW, you can lower your ''swiotlb=XX'' value. The 4% is how much you > are using of the default size.So I able to see this with an atl1c ethernet driver on my SandyBridge i3 box. It looks as if the card is truly 32-bit so on a box with 8GB it bounces the data. If I booted the Xen hypervisor with ''mem=4GB'' I get no bounces (no surprise there). In other words - I see the same behavior you are seeing. Now off to:> > I should find out_why_ the old Xen kernels do not use the bounce buffer > so much...which will require some fiddling around.
Konrad Rzeszutek Wilk
2011-Dec-14 22:07 UTC
Re: Load increase after memory upgrade (part2)
On Wed, Dec 14, 2011 at 04:23:51PM -0400, Konrad Rzeszutek Wilk wrote:> On Mon, Dec 05, 2011 at 10:26:21PM -0500, Konrad Rzeszutek Wilk wrote: > > On Sun, Dec 04, 2011 at 01:09:28PM +0100, Carsten Schiers wrote: > > > Here with two cards enabled and creating a bit "work" by watching TV with one oft hem: > > > > > > [ 23.842720] Starting SWIOTLB debug thread. > > > [ 23.842750] swiotlb_start_thread: Go! > > > [ 23.842838] xen_swiotlb_start_thread: Go! > > > [ 28.841451] 0 [budget_av 0000:00:01.0] bounce: from:435596(slow:0)to:0 map:658 unmap:0 sync:435596 > > > [ 28.841592] SWIOTLB is 4% full > > > [ 33.840147] 0 [budget_av 0000:00:01.0] bounce: from:127652(slow:0)to:0 map:0 unmap:0 sync:127652 > > > [ 33.840283] SWIOTLB is 4% full > > > [ 33.844222] 0 budget_av 0000:00:01.0 alloc coherent: 8, free: 0 > > > [ 38.840227] 0 [budget_av 0000:00:01.0] bounce: from:128310(slow:0)to:0 map:0 unmap:0 sync:128310 > > > > Whoa. Yes. You are definitly using the bounce buffer :-) > > > > Now it is time to look at why the drive is not using those coherent ones - it > > looks to allocate just eight of them but does not use them.. Unless it is > > using them _and_ bouncing them (which would be odd). > > > > And BTW, you can lower your ''swiotlb=XX'' value. The 4% is how much you > > are using of the default size. > > So I able to see this with an atl1c ethernet driver on my SandyBridge i3 > box. It looks as if the card is truly 32-bit so on a box with 8GB it > bounces the data. If I booted the Xen hypervisor with ''mem=4GB'' I get no > bounces (no surprise there). > > In other words - I see the same behavior you are seeing. Now off to: > > > > I should find out_why_ the old Xen kernels do not use the bounce buffer > > so much... > > which will require some fiddling around.And I am not seeing any difference - the swiotlb is used with the same usage when booting a classic (old style XEnoLinux) 2.6.32 vs using a brand new pvops (3.2). Obviously if I limit the physical amount of memory (so ''mem=4GB'' on Xen hypervisor line), the bounce usage disappears. Hmm, I wonder if there is a nice way to tell the hypervisor - hey, please stuff dom0 under 4GB. Here is the patch I used against classic XenLinux. Any chance you could run it with your classis guests and see what numbers you get? _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
...> which will require some fiddling around.Here is the patch I used against classic XenLinux. Any chance you could run it with your classis guests and see what numbers you get? Sure, it might take a bit, but I''ll try it with my 2.6.34 classic kernel. Carsten. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Well, it will do nothing but print out “SWIOTLB is 0% full”. Does that help? Or do you think something went wrong with the patch… BR, Carsten. Von: Carsten Schiers Gesendet: Donnerstag, 15. Dezember 2011 15:53 An: Konrad Rzeszutek Wilk; Konrad Rzeszutek Wilk Cc: linux@eikelenboom.it; zhenzhong.duan@oracle.com; Ian Campbell; lersek@redhat.com; xen-devel Betreff: AW: [Xen-devel] Load increase after memory upgrade (part2) ...> which will require some fiddling around.Here is the patch I used against classic XenLinux. Any chance you could run it with your classis guests and see what numbers you get? Sure, it might take a bit, but I''ll try it with my 2.6.34 classic kernel. Carsten. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2011-Dec-16 15:04 UTC
Re: Load increase after memory upgrade (part2)
On Fri, Dec 16, 2011 at 03:56:10PM +0100, Carsten Schiers wrote:> Well, it will do nothing but print out “SWIOTLB is 0% full”. > > > Does that help? Or do you think something went wrong with the patch… >And you are using swiotlb=force on the 2.6.34 classic kernel and passing in your budget-av card in it? Could you append the dmesg output please? Thanks.> > BR, > > Carsten. > > > > > Von: Carsten Schiers > Gesendet: Donnerstag, 15. Dezember 2011 15:53 > An: Konrad Rzeszutek Wilk; Konrad Rzeszutek Wilk > Cc: linux@eikelenboom.it; zhenzhong.duan@oracle.com; Ian Campbell; lersek@redhat.com; xen-devel > Betreff: AW: [Xen-devel] Load increase after memory upgrade (part2) > > > ... > > > which will require some fiddling around. > > Here is the patch I used against classic XenLinux. Any chance you could run > it with your classis guests and see what numbers you get? > > Sure, it might take a bit, but I'll try it with my 2.6.34 classic kernel. > > > Carsten. >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> And you are using swiotlb=force on the 2.6.34 classic kernel and passing in your budget-av card in it?Yes, two of them with swiotlb=32,force.> Could you append the dmesg output please?Attached. You find a "normal" boot after the one with the patched kernel. Carsten. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2011-Dec-16 16:19 UTC
Re: Load increase after memory upgrade (part2)
On Fri, Dec 16, 2011 at 04:51:47PM +0100, Carsten Schiers wrote:> > And you are using swiotlb=force on the 2.6.34 classic kernel and passing in your budget-av card in it? > > Yes, two of them with swiotlb=32,force. > > > > Could you append the dmesg output please? > > Attached. You find a "normal" boot after the one with the patched kernel.Uh, what happens when you run the driver, meaning capture stuff. I remember with the pvops you had about ~30K or so of bounces, but not sure about the bootup? Thanks for being willing to be a guinea pig while trying to fix this.> > Carsten. > >
OK, double checked. Both PCI cards enabled, running, working, but nothing but "SWIOTLB is 0% full". Any chance to check that the patch is working? Does it print out something else with your setting? BR, Carsten. -----Ursprüngliche Nachricht----- Von: xen-devel-bounces@lists.xensource.com [mailto:xen-devel-bounces@lists.xensource.com] Im Auftrag von Konrad Rzeszutek Wilk Gesendet: Freitag, 16. Dezember 2011 17:19 An: Carsten Schiers Cc: linux@eikelenboom.it; xen-devel; lersek@redhat.com; zhenzhong.duan@oracle.com; Ian Campbell Betreff: Re: [Xen-devel] Load increase after memory upgrade (part2) On Fri, Dec 16, 2011 at 04:51:47PM +0100, Carsten Schiers wrote:> > And you are using swiotlb=force on the 2.6.34 classic kernel and passing in your budget-av card in it? > > Yes, two of them with swiotlb=32,force. > > > > Could you append the dmesg output please? > > Attached. You find a "normal" boot after the one with the patched kernel.Uh, what happens when you run the driver, meaning capture stuff. I remember with the pvops you had about ~30K or so of bounces, but not sure about the bootup? Thanks for being willing to be a guinea pig while trying to fix this.> > Carsten. > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
I also have done some experiments with the patch, in domU i also get the 0% full for my usb controllers with video grabbers , in dom0 my i get 12% full, both my realtek 8169 ethernet controllers seem to use the bounce buffering ... And that with a iommu (amd) ? it all seems kind of strange, although it is also working ... I''m not having much time now, hoping to get back with a full report soon. -- Sander Saturday, December 17, 2011, 11:12:45 PM, you wrote:> OK, double checked. Both PCI cards enabled, running, working, but nothing but "SWIOTLB is 0% full". Any chance > to check that the patch is working? Does it print out something else with your setting? BR, Carsten.> -----Ursprüngliche Nachricht----- > Von: xen-devel-bounces@lists.xensource.com [mailto:xen-devel-bounces@lists.xensource.com] Im Auftrag von Konrad Rzeszutek Wilk > Gesendet: Freitag, 16. Dezember 2011 17:19 > An: Carsten Schiers > Cc: linux@eikelenboom.it; xen-devel; lersek@redhat.com; zhenzhong.duan@oracle.com; Ian Campbell > Betreff: Re: [Xen-devel] Load increase after memory upgrade (part2)> On Fri, Dec 16, 2011 at 04:51:47PM +0100, Carsten Schiers wrote: >> > And you are using swiotlb=force on the 2.6.34 classic kernel and passing in your budget-av card in it? >> >> Yes, two of them with swiotlb=32,force. >> >> >> > Could you append the dmesg output please? >> >> Attached. You find a "normal" boot after the one with the patched kernel.> Uh, what happens when you run the driver, meaning capture stuff. I remember with the pvops you had about ~30K or so of bounces, but not sure about the bootup?> Thanks for being willing to be a guinea pig while trying to fix this. >> >> Carsten. >> >>> _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel-- Best regards, Sander mailto:linux@eikelenboom.it
Konrad Rzeszutek Wilk
2011-Dec-19 14:54 UTC
Re: Load increase after memory upgrade (part2)
On Sat, Dec 17, 2011 at 11:12:45PM +0100, Carsten Schiers wrote:> OK, double checked. Both PCI cards enabled, running, working, but nothing but "SWIOTLB is 0% full". Any chance > to check that the patch is working? Does it print out something else with your setting? BR, Carsten.Hm, and with the pvops you got some numbers along with tons of ''bounce''. The one thing that I neglected in this patch is the alloc_coherent part.. which I don''t thing is that important as we did show that the alloc buffers are used. I don''t have anything concrete yet, but after the holidays should have a better idea of what is happening. Thanks for being willing to test this!> > -----Urspr?ngliche Nachricht----- > Von: xen-devel-bounces@lists.xensource.com [mailto:xen-devel-bounces@lists.xensource.com] Im Auftrag von Konrad Rzeszutek Wilk > Gesendet: Freitag, 16. Dezember 2011 17:19 > An: Carsten Schiers > Cc: linux@eikelenboom.it; xen-devel; lersek@redhat.com; zhenzhong.duan@oracle.com; Ian Campbell > Betreff: Re: [Xen-devel] Load increase after memory upgrade (part2) > > On Fri, Dec 16, 2011 at 04:51:47PM +0100, Carsten Schiers wrote: > > > And you are using swiotlb=force on the 2.6.34 classic kernel and passing in your budget-av card in it? > > > > Yes, two of them with swiotlb=32,force. > > > > > > > Could you append the dmesg output please? > > > > Attached. You find a "normal" boot after the one with the patched kernel. > > Uh, what happens when you run the driver, meaning capture stuff. I remember with the pvops you had about ~30K or so of bounces, but not sure about the bootup? > > Thanks for being willing to be a guinea pig while trying to fix this. > > > > Carsten. > > > > > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2011-Dec-19 14:56 UTC
Re: Load increase after memory upgrade (part2)
On Sun, Dec 18, 2011 at 01:19:16AM +0100, Sander Eikelenboom wrote:> I also have done some experiments with the patch, in domU i also get the 0% full for my usb controllers with video grabbers , in dom0 my i get 12% full, both my realtek 8169 ethernet controllers seem to use the bounce buffering ... > And that with a iommu (amd) ? it all seems kind of strange, although it is also working ... > I''m not having much time now, hoping to get back with a full report soon.Hm, so domU nothing, but dom0 it reports. Maybe the patch is incorrect when running as PV guest .. Will look in more details after the holidays. Thanks for being willing to try it out.
Konrad Rzeszutek Wilk
2012-Jan-10 21:55 UTC
Re: Load increase after memory upgrade (part2)
On Mon, Dec 19, 2011 at 10:56:09AM -0400, Konrad Rzeszutek Wilk wrote:> On Sun, Dec 18, 2011 at 01:19:16AM +0100, Sander Eikelenboom wrote: > > I also have done some experiments with the patch, in domU i also get the 0% full for my usb controllers with video grabbers , in dom0 my i get 12% full, both my realtek 8169 ethernet controllers seem to use the bounce buffering ... > > And that with a iommu (amd) ? it all seems kind of strange, although it is also working ... > > I''m not having much time now, hoping to get back with a full report soon. > > Hm, so domU nothing, but dom0 it reports. Maybe the patch is incorrect > when running as PV guest .. Will look in more details after the > holidays. Thanks for being willing to try it out.Good news is I am able to reproduce this with my 32-bit NIC with 3.2 domU: [ 771.896140] SWIOTLB is 11% full [ 776.896116] 0 [e1000 0000:00:00.0] bounce: from:222028(slow:0)to:2 map:222037 unmap:227220 sync:0 [ 776.896126] 1 [e1000 0000:00:00.0] bounce: from:0(slow:0)to:5188 map:5188 unmap:0 sync:0 [ 776.896133] 3 [e1000 0000:00:00.0] bounce: from:0(slow:0)to:1 map:1 unmap:0 sync:0 but interestingly enough, if I boot the guest as the first one I do not get these bounce requests. I will shortly bootup a Xen-O-Linux kernel and see if I get these same numbers.
Hello Konrad, Tuesday, January 10, 2012, 10:55:33 PM, you wrote:> On Mon, Dec 19, 2011 at 10:56:09AM -0400, Konrad Rzeszutek Wilk wrote: >> On Sun, Dec 18, 2011 at 01:19:16AM +0100, Sander Eikelenboom wrote: >> > I also have done some experiments with the patch, in domU i also get the 0% full for my usb controllers with video grabbers , in dom0 my i get 12% full, both my realtek 8169 ethernet controllers seem to use the bounce buffering ... >> > And that with a iommu (amd) ? it all seems kind of strange, although it is also working ... >> > I''m not having much time now, hoping to get back with a full report soon. >> >> Hm, so domU nothing, but dom0 it reports. Maybe the patch is incorrect >> when running as PV guest .. Will look in more details after the >> holidays. Thanks for being willing to try it out.> Good news is I am able to reproduce this with my 32-bit NIC with 3.2 domU:> [ 771.896140] SWIOTLB is 11% full > [ 776.896116] 0 [e1000 0000:00:00.0] bounce: from:222028(slow:0)to:2 map:222037 unmap:227220 sync:0 > [ 776.896126] 1 [e1000 0000:00:00.0] bounce: from:0(slow:0)to:5188 map:5188 unmap:0 sync:0 > [ 776.896133] 3 [e1000 0000:00:00.0] bounce: from:0(slow:0)to:1 map:1 unmap:0 sync:0> but interestingly enough, if I boot the guest as the first one I do not get these bounce > requests. I will shortly bootup a Xen-O-Linux kernel and see if I get these same > numbers.I started to expiriment some more with what i encountered. On dom0 i was seeing that my r8169 ethernet controllers where using bounce buffering with the dump-swiotlb module. It was showing "12% full". Checking in sysfs shows: serveerstertje:/sys/bus/pci/devices/0000:09:00.0# cat consistent_dma_mask_bits 32 serveerstertje:/sys/bus/pci/devices/0000:09:00.0# cat dma_mask_bits 32 If i remember correctly wasn''t the allocation for dom0 changed to be to the top of memory instead of low .. somewhere between 2.6.32 and 3.0 ? Could that change cause the need for all devices to need bounce buffering and could it therefore explain some people seeing more cpu usage for dom0 ? I have forced my r8169 to use 64bits dma mask (using use_dac=1) serveerstertje:/sys/bus/pci/devices/0000:09:00.0# cat consistent_dma_mask_bits 32 serveerstertje:/sys/bus/pci/devices/0000:09:00.0# cat dma_mask_bits 64 This results in dump-swiotlb reporting: [ 1265.616106] 0 [r8169 0000:09:00.0] bounce: from:5(slow:0)to:0 map:0 unmap:0 sync:10 [ 1265.625043] SWIOTLB is 0% full [ 1270.626085] 0 [r8169 0000:08:00.0] bounce: from:6(slow:0)to:0 map:0 unmap:0 sync:12 [ 1270.635024] SWIOTLB is 0% full [ 1275.635091] 0 [r8169 0000:09:00.0] bounce: from:5(slow:0)to:0 map:0 unmap:0 sync:10 [ 1275.644261] SWIOTLB is 0% full [ 1280.654097] 0 [r8169 0000:09:00.0] bounce: from:5(slow:0)to:0 map:0 unmap:0 sync:10 So it has changed from 12% to 0%, although it still reports something about bouncing ? or am i mis interpreting stuff ? Another thing i was wondering about, couldn''t the hypervisor offer a small window in 32bit addressable mem to all (or only when pci passthrough is used) domU''s to be used for DMA ? (oh yes, i haven''t got i clue what i''m talking about ... so it probably make no sense at all :-) ) -- Sander
>>> On 12.01.12 at 23:06, Sander Eikelenboom <linux@eikelenboom.it> wrote: > Another thing i was wondering about, couldn''t the hypervisor offer a small > window in 32bit addressable mem to all (or only when pci passthrough is used) > domU''s to be used for DMA ?How would use of such a range be arbitrated/protected? You''d have to ask for reservation (aka allocation) of a chunk anyway, which is as good as using the existing interfaces to obtain address restricted memory (and the hypervisor has a [rudimentary] mechanism to preserve some low memory for DMA allocations). Jan
Konrad Rzeszutek Wilk
2012-Jan-13 15:13 UTC
Re: Load increase after memory upgrade (part2)
> >> > I also have done some experiments with the patch, in domU i also get the 0% full for my usb controllers with video grabbers , in dom0 my i get 12% full, both my realtek 8169 ethernet controllers seem to use the bounce buffering ... > >> > And that with a iommu (amd) ? it all seems kind of strange, although it is also working ... > >> > I''m not having much time now, hoping to get back with a full report soon. > >> > >> Hm, so domU nothing, but dom0 it reports. Maybe the patch is incorrect > >> when running as PV guest .. Will look in more details after the > >> holidays. Thanks for being willing to try it out. > > > Good news is I am able to reproduce this with my 32-bit NIC with 3.2 domU: > > > [ 771.896140] SWIOTLB is 11% full > > [ 776.896116] 0 [e1000 0000:00:00.0] bounce: from:222028(slow:0)to:2 map:222037 unmap:227220 sync:0 > > [ 776.896126] 1 [e1000 0000:00:00.0] bounce: from:0(slow:0)to:5188 map:5188 unmap:0 sync:0 > > [ 776.896133] 3 [e1000 0000:00:00.0] bounce: from:0(slow:0)to:1 map:1 unmap:0 sync:0 > > > but interestingly enough, if I boot the guest as the first one I do not get these bounce > > requests. I will shortly bootup a Xen-O-Linux kernel and see if I get these same > > numbers. > > > I started to expiriment some more with what i encountered. > > On dom0 i was seeing that my r8169 ethernet controllers where using bounce buffering with the dump-swiotlb module. > It was showing "12% full". > Checking in sysfs shows: > serveerstertje:/sys/bus/pci/devices/0000:09:00.0# cat consistent_dma_mask_bits > 32 > serveerstertje:/sys/bus/pci/devices/0000:09:00.0# cat dma_mask_bits > 32 > > If i remember correctly wasn''t the allocation for dom0 changed to be to the top of memory instead of low .. somewhere between 2.6.32 and 3.0 ?? We never actually had dom0 support in the upstream kernel until 2.6.37.. The 2.6.32<->2.6.36 you are referring to must have been the trees that I spun up - but the implementation of SWIOTLB in them had not really changed.> Could that change cause the need for all devices to need bounce buffering and could it therefore explain some people seeing more cpu usage for dom0 ?The issue I am seeing is not CPU usage in dom0, but rather the CPU usage in domU with guests. And that the older domU''s (XenOLinux) do not have this. That I can''t understand - the implementation in both cases _looks_ to do the same thing. There was one issue I found in the upstream one, but even with that fix I still get that "bounce" usage in domU. Interestingly enough, I get that only if I have launched, destroyed, launched, etc, the guest multiple times before I get this. Which leads me to believe this is not a kernel issue but that we are simply fragmented the Xen memory so much, so that when it launches the guest all of the memory is above 4GB. But that seems counter-intuive as by default Xen starts guests at the far end of memory (so on my 16GB box it would stick a 4GB guest at 12GB->16GB roughly). The SWIOTLB swizzles some memory under the 4GB , and this is where we get the bounce buffer effect (as the memory from 4GB is then copied to the memory 12GB->16GB). But it does not explain why on the first couple of starts I did not see this with pvops. And it does not seem to happen with the XenOLinux kernel, so there must be something else in here.> > I have forced my r8169 to use 64bits dma mask (using use_dac=1)Ah yes.> serveerstertje:/sys/bus/pci/devices/0000:09:00.0# cat consistent_dma_mask_bits > 32 > serveerstertje:/sys/bus/pci/devices/0000:09:00.0# cat dma_mask_bits > 64 > > This results in dump-swiotlb reporting: > > [ 1265.616106] 0 [r8169 0000:09:00.0] bounce: from:5(slow:0)to:0 map:0 unmap:0 sync:10 > [ 1265.625043] SWIOTLB is 0% full > [ 1270.626085] 0 [r8169 0000:08:00.0] bounce: from:6(slow:0)to:0 map:0 unmap:0 sync:12 > [ 1270.635024] SWIOTLB is 0% full > [ 1275.635091] 0 [r8169 0000:09:00.0] bounce: from:5(slow:0)to:0 map:0 unmap:0 sync:10 > [ 1275.644261] SWIOTLB is 0% full > [ 1280.654097] 0 [r8169 0000:09:00.0] bounce: from:5(slow:0)to:0 map:0 unmap:0 sync:10Which is what we expect. No need to bounce since the PCI adapter can reach memory above the 4GB mark.> > > > So it has changed from 12% to 0%, although it still reports something about bouncing ? or am i mis interpreting stuff ?The bouncing can happen due to two cases: - Memory is above 4GB - Memory crosses a page-boundary (rarely happens).> > > Another thing i was wondering about, couldn''t the hypervisor offer a small window in 32bit addressable mem to all (or only when pci passthrough is used) domU''s to be used for DMA ?It does. That is what the Xen SWIOTLB does with "swizzling" the pages in its pool. But it can''t do it for every part of memory. That is why there are DMA pools which are used by graphics adapters, video capture devices,storage and network drivers. They are used for small packet sizes so that the driver does not have to allocate DMA buffers when it gets a 100bytes ping response. But for large packets (say that ISO file you are downloading) it allocates memory on the fly and "maps" it into the PCI space using the DMA API. That "mapping" sets up an "physical memory" -> "guest memory" translation - and if that allocated memory is above 4GB, part of this mapping is to copy ("bounce") the memory under the 4GB (where XenSWIOTLB has allocated a pool), so that the adapter can physically fetch/put the data. Once that is completed it is "sync"-ed back, which is bouncing that data to the "allocated memory". So having a DMA pool is very good - and most drivers use it. The thing I can''t figure out is: - why the DVB do not seem to use it, even thought they look to use the videobuf_dma driver. - why the XenOLinux does not seem to have this problem (and this might be false - perhaps it does have this problem and it just takes a couple of guest launches, destructions, starts, etc to actually see it). - are there any flags in the domain builder to say: "ok, this domain is going to service 32-bit cards, hence build the memory from 0->4GB". This seems like a good know at first, but it probably is a bad idea (imagine using it by mistake on every guest). And also nowadays most cards are PCIe and they can do 64-bit, so it would not be that important in the future.> > (oh yes, i haven''t got i clue what i''m talking about ... so it probably make no sense at all :-) )Nonsense. You were on the correct path . Hopefully the level of details hasn''t scared you off now :-)> > > -- > Sander > >
Friday, January 13, 2012, 4:13:07 PM, you wrote:>> >> > I also have done some experiments with the patch, in domU i also get the 0% full for my usb controllers with video grabbers , in dom0 my i get 12% full, both my realtek 8169 ethernet controllers seem to use the bounce buffering ... >> >> > And that with a iommu (amd) ? it all seems kind of strange, although it is also working ... >> >> > I''m not having much time now, hoping to get back with a full report soon. >> >> >> >> Hm, so domU nothing, but dom0 it reports. Maybe the patch is incorrect >> >> when running as PV guest .. Will look in more details after the >> >> holidays. Thanks for being willing to try it out. >> >> > Good news is I am able to reproduce this with my 32-bit NIC with 3.2 domU: >> >> > [ 771.896140] SWIOTLB is 11% full >> > [ 776.896116] 0 [e1000 0000:00:00.0] bounce: from:222028(slow:0)to:2 map:222037 unmap:227220 sync:0 >> > [ 776.896126] 1 [e1000 0000:00:00.0] bounce: from:0(slow:0)to:5188 map:5188 unmap:0 sync:0 >> > [ 776.896133] 3 [e1000 0000:00:00.0] bounce: from:0(slow:0)to:1 map:1 unmap:0 sync:0 >> >> > but interestingly enough, if I boot the guest as the first one I do not get these bounce >> > requests. I will shortly bootup a Xen-O-Linux kernel and see if I get these same >> > numbers. >> >> >> I started to expiriment some more with what i encountered. >> >> On dom0 i was seeing that my r8169 ethernet controllers where using bounce buffering with the dump-swiotlb module. >> It was showing "12% full". >> Checking in sysfs shows: >> serveerstertje:/sys/bus/pci/devices/0000:09:00.0# cat consistent_dma_mask_bits >> 32 >> serveerstertje:/sys/bus/pci/devices/0000:09:00.0# cat dma_mask_bits >> 32 >> >> If i remember correctly wasn''t the allocation for dom0 changed to be to the top of memory instead of low .. somewhere between 2.6.32 and 3.0 ?> ? We never actually had dom0 support in the upstream kernel until 2.6.37.. The 2.6.32<->2.6.36 you are > referring to must have been the trees that I spun up - but the implementation of SWIOTLB in them > had not really changed.>> Could that change cause the need for all devices to need bounce buffering and could it therefore explain some people seeing more cpu usage for dom0 ?> The issue I am seeing is not CPU usage in dom0, but rather the CPU usage in domU with guests. > And that the older domU''s (XenOLinux) do not have this.> That I can''t understand - the implementation in both cases _looks_ to do the same thing. > There was one issue I found in the upstream one, but even with that fix I still > get that "bounce" usage in domU.> Interestingly enough, I get that only if I have launched, destroyed, launched, etc, the guest multiple > times before I get this. Which leads me to believe this is not a kernel issue but that we > are simply fragmented the Xen memory so much, so that when it launches the guest all of the > memory is above 4GB. But that seems counter-intuive as by default Xen starts guests at the far end of > memory (so on my 16GB box it would stick a 4GB guest at 12GB->16GB roughly). The SWIOTLB > swizzles some memory under the 4GB , and this is where we get the bounce buffer effect > (as the memory from 4GB is then copied to the memory 12GB->16GB).> But it does not explain why on the first couple of starts I did not see this with pvops. > And it does not seem to happen with the XenOLinux kernel, so there must be something else > in here.>> >> I have forced my r8169 to use 64bits dma mask (using use_dac=1)> Ah yes. >> serveerstertje:/sys/bus/pci/devices/0000:09:00.0# cat consistent_dma_mask_bits >> 32 >> serveerstertje:/sys/bus/pci/devices/0000:09:00.0# cat dma_mask_bits >> 64 >> >> This results in dump-swiotlb reporting: >> >> [ 1265.616106] 0 [r8169 0000:09:00.0] bounce: from:5(slow:0)to:0 map:0 unmap:0 sync:10 >> [ 1265.625043] SWIOTLB is 0% full >> [ 1270.626085] 0 [r8169 0000:08:00.0] bounce: from:6(slow:0)to:0 map:0 unmap:0 sync:12 >> [ 1270.635024] SWIOTLB is 0% full >> [ 1275.635091] 0 [r8169 0000:09:00.0] bounce: from:5(slow:0)to:0 map:0 unmap:0 sync:10 >> [ 1275.644261] SWIOTLB is 0% full >> [ 1280.654097] 0 [r8169 0000:09:00.0] bounce: from:5(slow:0)to:0 map:0 unmap:0 sync:10> Which is what we expect. No need to bounce since the PCI adapter can reach memory > above the 4GB mark.>> >> >> >> So it has changed from 12% to 0%, although it still reports something about bouncing ? or am i mis interpreting stuff ?> The bouncing can happen due to two cases: > - Memory is above 4GB > - Memory crosses a page-boundary (rarely happens). >> >> >> Another thing i was wondering about, couldn''t the hypervisor offer a small window in 32bit addressable mem to all (or only when pci passthrough is used) domU''s to be used for DMA ?> It does. That is what the Xen SWIOTLB does with "swizzling" the pages in its pool. > But it can''t do it for every part of memory. That is why there are DMA pools > which are used by graphics adapters, video capture devices,storage and network > drivers. They are used for small packet sizes so that the driver does not have > to allocate DMA buffers when it gets a 100bytes ping response. But for large > packets (say that ISO file you are downloading) it allocates memory on the fly > and "maps" it into the PCI space using the DMA API. That "mapping" sets up > an "physical memory" -> "guest memory" translation - and if that allocated > memory is above 4GB, part of this mapping is to copy ("bounce") the memory > under the 4GB (where XenSWIOTLB has allocated a pool), so that the adapter > can physically fetch/put the data. Once that is completed it is "sync"-ed > back, which is bouncing that data to the "allocated memory".> So having a DMA pool is very good - and most drivers use it. The thing I can''t > figure out is: > - why the DVB do not seem to use it, even thought they look to use the videobuf_dma > driver. > - why the XenOLinux does not seem to have this problem (and this might be false - > perhaps it does have this problem and it just takes a couple of guest launches, > destructions, starts, etc to actually see it). > - are there any flags in the domain builder to say: "ok, this domain is going to > service 32-bit cards, hence build the memory from 0->4GB". This seems like > a good know at first, but it probably is a bad idea (imagine using it by mistake > on every guest). And also nowadays most cards are PCIe and they can do 64-bit, so > it would not be that important in the future. >> >> (oh yes, i haven''t got i clue what i''m talking about ... so it probably make no sense at all :-) )> Nonsense. You were on the correct path . Hopefully the level of details hasn''t > scared you off now :-)Well it only gives some more questions :-) The thing is, pci passthrough and especially the DMA part of it, all work behind the scenes without giving much output about the way it is actually working. The thing i was wondering about is if my AMD IOMMU is actually doing something for PV guests. When booting with iommu=off machine has 8GB mem, dom0 limited to 1024M and just starting one domU with iommu=soft, with pci-passthrough and the USB pci-cards with USB videograbbers attached to it, i would expect to find some bounce buffering going. (HV_START_LOW 18446603336221196288) (FEATURES ''!writable_page_tables|pae_pgdir_above_4gb'') (VIRT_BASE 18446744071562067968) (GUEST_VERSION 2.6) (PADDR_OFFSET 0) (GUEST_OS linux) (HYPERCALL_PAGE 18446744071578849280) (LOADER generic) (SUSPEND_CANCEL 1) (PAE_MODE yes) (ENTRY 18446744071594476032) (XEN_VERSION xen-3.0) Still i only see: [ 47.449072] Starting SWIOTLB debug thread. [ 47.449090] swiotlb_start_thread: Go! [ 47.449262] xen_swiotlb_start_thread: Go! [ 52.449158] 0 [ehci_hcd 0000:0a:00.3] bounce: from:432(slow:0)to:1329 map:1756 unmap:1781 sync:0 [ 52.449180] 1 [ehci_hcd 0000:0a:00.7] bounce: from:0(slow:0)to:16 map:23 unmap:0 sync:0 [ 52.449187] 2 [ohci_hcd 0000:0a:00.4] bounce: from:0(slow:0)to:4 map:5 unmap:0 sync:0 [ 52.449226] SWIOTLB is 0% full [ 57.449180] 0 ehci_hcd 0000:0a:00.3 alloc coherent: 35, free: 0 [ 57.449219] 1 ohci_hcd 0000:0a:00.6 alloc coherent: 1, free: 0 [ 57.449265] SWIOTLB is 0% full [ 62.449176] SWIOTLB is 0% full [ 67.449336] SWIOTLB is 0% full [ 72.449279] SWIOTLB is 0% full [ 77.449121] SWIOTLB is 0% full [ 82.449236] SWIOTLB is 0% full [ 87.449242] SWIOTLB is 0% full [ 92.449241] SWIOTLB is 0% full [ 172.449102] 0 [ehci_hcd 0000:0a:00.7] bounce: from:3839(slow:0)to:664 map:4486 unmap:4617 sync:0 [ 172.449123] 1 [ehci_hcd 0000:0a:00.7] bounce: from:0(slow:0)to:82 map:111 unmap:0 sync:0 [ 172.449130] 2 [ehci_hcd 0000:0a:00.7] bounce: from:0(slow:0)to:32 map:36 unmap:0 sync:0 [ 172.449170] SWIOTLB is 0% full [ 177.449109] 0 [ehci_hcd 0000:0a:00.7] bounce: from:5348(slow:0)to:524 map:5834 unmap:5952 sync:0 [ 177.449131] 1 [ehci_hcd 0000:0a:00.7] bounce: from:0(slow:0)to:76 map:112 unmap:0 sync:0 [ 177.449138] 2 [ehci_hcd 0000:0a:00.7] bounce: from:0(slow:0)to:4 map:6 unmap:0 sync:0 [ 177.449178] SWIOTLB is 0% full [ 182.449143] 0 [ehci_hcd 0000:0a:00.7] bounce: from:5349(slow:0)to:563 map:5899 unmap:5949 sync:0 [ 182.449157] 1 [ehci_hcd 0000:0a:00.7] bounce: from:0(slow:0)to:27 map:35 unmap:0 sync:0 [ 182.449164] 2 [ehci_hcd 0000:0a:00.7] bounce: from:0(slow:0)to:10 map:15 unmap:0 sync:0 [ 182.449204] SWIOTLB is 0% full [ 187.449112] 0 [ehci_hcd 0000:0a:00.7] bounce: from:5375(slow:0)to:592 map:5941 unmap:6022 sync:0 [ 187.449126] 1 [ehci_hcd 0000:0a:00.7] bounce: from:0(slow:0)to:46 map:69 unmap:0 sync:0 [ 187.449133] 2 [ehci_hcd 0000:0a:00.7] bounce: from:0(slow:0)to:9 map:12 unmap:0 sync:0 [ 187.449173] SWIOTLB is 0% full [ 192.449183] 0 [ehci_hcd 0000:0a:00.7] bounce: from:5360(slow:0)to:556 map:5890 unmap:5978 sync:0 [ 192.449226] 1 [ehci_hcd 0000:0a:00.7] bounce: from:0(slow:0)to:52 map:74 unmap:0 sync:0 [ 192.449234] 2 [ehci_hcd 0000:0a:00.7] bounce: from:0(slow:0)to:10 map:14 unmap:0 sync:0 [ 192.449275] SWIOTLB is 0% full And the devices do work ... so how does that work ... Thx for your explanation so far ! -- Sander>> >> >> -- >> Sander >> >>-- Best regards, Sander mailto:linux@eikelenboom.it
Konrad Rzeszutek Wilk
2012-Jan-17 21:02 UTC
Re: Load increase after memory upgrade (part2)
> The thing i was wondering about is if my AMD IOMMU is actually doing something for PV guests. > When booting with iommu=off machine has 8GB mem, dom0 limited to 1024M and just starting one domU with iommu=soft, with pci-passthrough and the USB pci-cards with USB videograbbers attached to it, i would expect to find some bounce buffering going. > > (HV_START_LOW 18446603336221196288) > (FEATURES ''!writable_page_tables|pae_pgdir_above_4gb'') > (VIRT_BASE 18446744071562067968) > (GUEST_VERSION 2.6) > (PADDR_OFFSET 0) > (GUEST_OS linux) > (HYPERCALL_PAGE 18446744071578849280) > (LOADER generic) > (SUSPEND_CANCEL 1) > (PAE_MODE yes) > (ENTRY 18446744071594476032) > (XEN_VERSION xen-3.0) > > Still i only see: > > [ 47.449072] Starting SWIOTLB debug thread. > [ 47.449090] swiotlb_start_thread: Go! > [ 47.449262] xen_swiotlb_start_thread: Go! > [ 52.449158] 0 [ehci_hcd 0000:0a:00.3] bounce: from:432(slow:0)to:1329 map:1756 unmap:1781 sync:0There is bouncing there. ..> [ 172.449102] 0 [ehci_hcd 0000:0a:00.7] bounce: from:3839(slow:0)to:664 map:4486 unmap:4617 sync:0And there.. 3839 of them.> [ 172.449123] 1 [ehci_hcd 0000:0a:00.7] bounce: from:0(slow:0)to:82 map:111 unmap:0 sync:0 > [ 172.449130] 2 [ehci_hcd 0000:0a:00.7] bounce: from:0(slow:0)to:32 map:36 unmap:0 sync:0 > [ 172.449170] SWIOTLB is 0% full > [ 177.449109] 0 [ehci_hcd 0000:0a:00.7] bounce: from:5348(slow:0)to:524 map:5834 unmap:5952 sync:0And 5348 here! So bounce-buffering is definitly happening with this guest. .. snip..> > And the devices do work ... so how does that work ...Most (all?) drivers are written to work with bounce-buffering. That has never been a problem. The issue as I understand is that the DVB drivers allocate their buffers from 0->4GB most (all the time?) so they never have to do bounce-buffering. While the pv-ops one ends up quite frequently doing the bounce-buffering, which implies that the DVB drivers end up allocating their buffers above the 4GB. This means we end up spending some CPU time (in the guest) copying the memory from >4GB to 0-4GB region (And vice-versa). And I am not clear why this is happening. Hence my thought was to run an Xen-O-Linux kernel v2.6.3X and a PVOPS v2.6.3X (where X is the same) with the same PCI device (and the test would entail rebooting the box in between the launches) to confirm that the Xen-O-Linux is doing something that the PVOPS is not. So far, I''ve haven''t had much luck compiling a Xen-O-Linux v2.6.38 kernel so :-(> > Thx for your explanation so far !Sure thing.
On Tue, Jan 17, 2012 at 04:02:25PM -0500, Konrad Rzeszutek Wilk wrote:> > > > And the devices do work ... so how does that work ... > > Most (all?) drivers are written to work with bounce-buffering. > That has never been a problem. > > The issue as I understand is that the DVB drivers allocate their buffers > from 0->4GB most (all the time?) so they never have to do bounce-buffering. > > While the pv-ops one ends up quite frequently doing the bounce-buffering, which > implies that the DVB drivers end up allocating their buffers above the 4GB. > This means we end up spending some CPU time (in the guest) copying the memory > from >4GB to 0-4GB region (And vice-versa). > > And I am not clear why this is happening. Hence my thought > was to run an Xen-O-Linux kernel v2.6.3X and a PVOPS v2.6.3X (where X is the > same) with the same PCI device (and the test would entail rebooting the > box in between the launches) to confirm that the Xen-O-Linux is doing something > that the PVOPS is not. > > So far, I''ve haven''t had much luck compiling a Xen-O-Linux v2.6.38 kernel > so :-( >Did you try downloading a binary rpm (or src.rpm) from OpenSuse? I think they have 2.6.38 xenlinux kernel available. -- Pasi
>>> On 17.01.12 at 22:02, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote: > The issue as I understand is that the DVB drivers allocate their buffers > from 0->4GB most (all the time?) so they never have to do bounce-buffering. > > While the pv-ops one ends up quite frequently doing the bounce-buffering, > which > implies that the DVB drivers end up allocating their buffers above the 4GB. > This means we end up spending some CPU time (in the guest) copying the > memory > from >4GB to 0-4GB region (And vice-versa).This reminds me of something (not sure what XenoLinux you use for comparison) - how are they allocating that memory? Not vmalloc_32() by chance (I remember having seen numerous uses under - iirc - drivers/media/)? Obviously, vmalloc_32() and any GFP_DMA32 allocations do *not* do what their (driver) callers might expect in a PV guest (including the contiguity assumption for the latter, recalling that you earlier said you were able to see the problem after several guest starts), and I had put into our kernels an adjustment to make vmalloc_32() actually behave as expected. Jan
>>> On 18.01.12 at 12:28, Pasi Kärkkäinen<pasik@iki.fi> wrote: > On Tue, Jan 17, 2012 at 04:02:25PM -0500, Konrad Rzeszutek Wilk wrote: >> > >> > And the devices do work ... so how does that work ... >> >> Most (all?) drivers are written to work with bounce-buffering. >> That has never been a problem. >> >> The issue as I understand is that the DVB drivers allocate their buffers >> from 0->4GB most (all the time?) so they never have to do bounce-buffering. >> >> While the pv-ops one ends up quite frequently doing the bounce-buffering, > which >> implies that the DVB drivers end up allocating their buffers above the 4GB. >> This means we end up spending some CPU time (in the guest) copying the > memory >> from >4GB to 0-4GB region (And vice-versa). >> >> And I am not clear why this is happening. Hence my thought >> was to run an Xen-O-Linux kernel v2.6.3X and a PVOPS v2.6.3X (where X is the >> same) with the same PCI device (and the test would entail rebooting the >> box in between the launches) to confirm that the Xen-O-Linux is doing > something >> that the PVOPS is not. >> >> So far, I've haven't had much luck compiling a Xen-O-Linux v2.6.38 kernel >> so :-( >> > > Did you try downloading a binary rpm (or src.rpm) from OpenSuse? > I think they have 2.6.38 xenlinux kernel available.openSUSE 11.4 is using 2.6.37; 12.1 is on 3.1 (and SLE is on 3.0). Pulling out (consistent) patches at 2.6.38 level might be a little involved. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2012-Jan-18 14:29 UTC
Re: Load increase after memory upgrade (part2)
On Wed, Jan 18, 2012 at 11:35:35AM +0000, Jan Beulich wrote:> >>> On 17.01.12 at 22:02, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote: > > The issue as I understand is that the DVB drivers allocate their buffers > > from 0->4GB most (all the time?) so they never have to do bounce-buffering. > > > > While the pv-ops one ends up quite frequently doing the bounce-buffering, > > which > > implies that the DVB drivers end up allocating their buffers above the 4GB. > > This means we end up spending some CPU time (in the guest) copying the > > memory > > from >4GB to 0-4GB region (And vice-versa). > > This reminds me of something (not sure what XenoLinux you use for > comparison) - how are they allocating that memory? Not vmalloc_32()I was using the 2.6.18, then the one I saw on Google for Gentoo, and now I am going to look at the 2.6.38 from OpenSuSE.> by chance (I remember having seen numerous uses under - iirc - > drivers/media/)? > > Obviously, vmalloc_32() and any GFP_DMA32 allocations do *not* do > what their (driver) callers might expect in a PV guest (including the > contiguity assumption for the latter, recalling that you earlier said > you were able to see the problem after several guest starts), and I > had put into our kernels an adjustment to make vmalloc_32() actually > behave as expected.Aaah.. The plot thickens! Let me look in the sources! Thanks for the pointer.
Konrad Rzeszutek Wilk
2012-Jan-23 22:32 UTC
Re: Load increase after memory upgrade (part2)
On Wed, Jan 18, 2012 at 10:29:23AM -0400, Konrad Rzeszutek Wilk wrote:> On Wed, Jan 18, 2012 at 11:35:35AM +0000, Jan Beulich wrote: > > >>> On 17.01.12 at 22:02, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote: > > > The issue as I understand is that the DVB drivers allocate their buffers > > > from 0->4GB most (all the time?) so they never have to do bounce-buffering. > > > > > > While the pv-ops one ends up quite frequently doing the bounce-buffering, > > > which > > > implies that the DVB drivers end up allocating their buffers above the 4GB. > > > This means we end up spending some CPU time (in the guest) copying the > > > memory > > > from >4GB to 0-4GB region (And vice-versa). > > > > This reminds me of something (not sure what XenoLinux you use for > > comparison) - how are they allocating that memory? Not vmalloc_32() > > I was using the 2.6.18, then the one I saw on Google for Gentoo, and now > I am going to look at the 2.6.38 from OpenSuSE. > > > by chance (I remember having seen numerous uses under - iirc - > > drivers/media/)? > > > > Obviously, vmalloc_32() and any GFP_DMA32 allocations do *not* do > > what their (driver) callers might expect in a PV guest (including the > > contiguity assumption for the latter, recalling that you earlier said > > you were able to see the problem after several guest starts), and I > > had put into our kernels an adjustment to make vmalloc_32() actually > > behave as expected. > > Aaah.. The plot thickens! Let me look in the sources! Thanks for the > pointer.Jan hints lead me to the videobuf-dma-sg.c which does indeed to vmalloc_32 and then performs PCI DMA operations on the allocted vmalloc_32 area. So I cobbled up the attached patch (hadn''t actually tested it and sadly won''t until next week) which removes the call to vmalloc_32 and instead sets up DMA allocated set of pages. If that fixes it for you that is awesome, but if it breaks please send me your logs. Cheers, Konrad _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>>> On 23.01.12 at 23:32, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote: > On Wed, Jan 18, 2012 at 10:29:23AM -0400, Konrad Rzeszutek Wilk wrote: >> On Wed, Jan 18, 2012 at 11:35:35AM +0000, Jan Beulich wrote: >> > >>> On 17.01.12 at 22:02, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote: >> > > The issue as I understand is that the DVB drivers allocate their buffers >> > > from 0->4GB most (all the time?) so they never have to do bounce-buffering. >> > > >> > > While the pv-ops one ends up quite frequently doing the bounce-buffering, >> > > which >> > > implies that the DVB drivers end up allocating their buffers above the > 4GB. >> > > This means we end up spending some CPU time (in the guest) copying the >> > > memory >> > > from >4GB to 0-4GB region (And vice-versa). >> > >> > This reminds me of something (not sure what XenoLinux you use for >> > comparison) - how are they allocating that memory? Not vmalloc_32() >> >> I was using the 2.6.18, then the one I saw on Google for Gentoo, and now >> I am going to look at the 2.6.38 from OpenSuSE. >> >> > by chance (I remember having seen numerous uses under - iirc - >> > drivers/media/)? >> > >> > Obviously, vmalloc_32() and any GFP_DMA32 allocations do *not* do >> > what their (driver) callers might expect in a PV guest (including the >> > contiguity assumption for the latter, recalling that you earlier said >> > you were able to see the problem after several guest starts), and I >> > had put into our kernels an adjustment to make vmalloc_32() actually >> > behave as expected. >> >> Aaah.. The plot thickens! Let me look in the sources! Thanks for the >> pointer. > > Jan hints lead me to the videobuf-dma-sg.c which does indeed to vmalloc_32 > and then performs PCI DMA operations on the allocted vmalloc_32 > area. > > So I cobbled up the attached patch (hadn''t actually tested it and sadly > won''t until next week) which removes the call to vmalloc_32 and instead > sets up DMA allocated set of pages.What a big patch (which would need re-doing for every vmalloc_32() caller)! Fixing vmalloc_32() would be much less intrusive (reproducing our 3.2 version of the affected function below, but clearly that''s not pv-ops ready). Jan static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask, pgprot_t prot, int node, void *caller) { const int order = 0; struct page **pages; unsigned int nr_pages, array_size, i; gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO; #ifdef CONFIG_XEN gfp_t dma_mask = gfp_mask & (__GFP_DMA | __GFP_DMA32); BUILD_BUG_ON((__GFP_DMA | __GFP_DMA32) != (__GFP_DMA + __GFP_DMA32)); if (dma_mask == (__GFP_DMA | __GFP_DMA32)) gfp_mask &= ~(__GFP_DMA | __GFP_DMA32); #endif nr_pages = (area->size - PAGE_SIZE) >> PAGE_SHIFT; array_size = (nr_pages * sizeof(struct page *)); area->nr_pages = nr_pages; /* Please note that the recursion is strictly bounded. */ if (array_size > PAGE_SIZE) { pages = __vmalloc_node(array_size, 1, nested_gfp|__GFP_HIGHMEM, PAGE_KERNEL, node, caller); area->flags |= VM_VPAGES; } else { pages = kmalloc_node(array_size, nested_gfp, node); } area->pages = pages; area->caller = caller; if (!area->pages) { remove_vm_area(area->addr); kfree(area); return NULL; } for (i = 0; i < area->nr_pages; i++) { struct page *page; gfp_t tmp_mask = gfp_mask | __GFP_NOWARN; if (node < 0) page = alloc_page(tmp_mask); else page = alloc_pages_node(node, tmp_mask, order); if (unlikely(!page)) { /* Successfully allocated i pages, free them in __vunmap() */ area->nr_pages = i; goto fail; } area->pages[i] = page; #ifdef CONFIG_XEN if (dma_mask) { if (xen_limit_pages_to_max_mfn(page, 0, 32)) { area->nr_pages = i + 1; goto fail; } if (gfp_mask & __GFP_ZERO) clear_highpage(page); } #endif } if (map_vm_area(area, prot, &pages)) goto fail; return area->addr; fail: warn_alloc_failed(gfp_mask, order, "vmalloc: allocation failure, allocated %ld of %ld bytes\n", (area->nr_pages*PAGE_SIZE), area->size); vfree(area->addr); return NULL; } ... #if defined(CONFIG_64BIT) && defined(CONFIG_ZONE_DMA32) #define GFP_VMALLOC32 GFP_DMA32 | GFP_KERNEL #elif defined(CONFIG_64BIT) && defined(CONFIG_ZONE_DMA) #define GFP_VMALLOC32 GFP_DMA | GFP_KERNEL #elif defined(CONFIG_XEN) #define GFP_VMALLOC32 __GFP_DMA | __GFP_DMA32 | GFP_KERNEL #else #define GFP_VMALLOC32 GFP_KERNEL #endif
Konrad Rzeszutek Wilk
2012-Jan-24 14:17 UTC
Re: Load increase after memory upgrade (part2)
On Tue, Jan 24, 2012 at 08:58:22AM +0000, Jan Beulich wrote:> >>> On 23.01.12 at 23:32, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote: > > On Wed, Jan 18, 2012 at 10:29:23AM -0400, Konrad Rzeszutek Wilk wrote: > >> On Wed, Jan 18, 2012 at 11:35:35AM +0000, Jan Beulich wrote: > >> > >>> On 17.01.12 at 22:02, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote: > >> > > The issue as I understand is that the DVB drivers allocate their buffers > >> > > from 0->4GB most (all the time?) so they never have to do bounce-buffering. > >> > > > >> > > While the pv-ops one ends up quite frequently doing the bounce-buffering, > >> > > which > >> > > implies that the DVB drivers end up allocating their buffers above the > > 4GB. > >> > > This means we end up spending some CPU time (in the guest) copying the > >> > > memory > >> > > from >4GB to 0-4GB region (And vice-versa). > >> > > >> > This reminds me of something (not sure what XenoLinux you use for > >> > comparison) - how are they allocating that memory? Not vmalloc_32() > >> > >> I was using the 2.6.18, then the one I saw on Google for Gentoo, and now > >> I am going to look at the 2.6.38 from OpenSuSE. > >> > >> > by chance (I remember having seen numerous uses under - iirc - > >> > drivers/media/)? > >> > > >> > Obviously, vmalloc_32() and any GFP_DMA32 allocations do *not* do > >> > what their (driver) callers might expect in a PV guest (including the > >> > contiguity assumption for the latter, recalling that you earlier said > >> > you were able to see the problem after several guest starts), and I > >> > had put into our kernels an adjustment to make vmalloc_32() actually > >> > behave as expected. > >> > >> Aaah.. The plot thickens! Let me look in the sources! Thanks for the > >> pointer. > > > > Jan hints lead me to the videobuf-dma-sg.c which does indeed to vmalloc_32 > > and then performs PCI DMA operations on the allocted vmalloc_32 > > area. > > > > So I cobbled up the attached patch (hadn''t actually tested it and sadly > > won''t until next week) which removes the call to vmalloc_32 and instead > > sets up DMA allocated set of pages. > > What a big patch (which would need re-doing for every vmalloc_32() > caller)! Fixing vmalloc_32() would be much less intrusive (reproducing > our 3.2 version of the affected function below, but clearly that''s not > pv-ops ready).I just want to get to the bottom of this before attempting a proper fix.> > Jan > > static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask, > pgprot_t prot, int node, void *caller) > { > const int order = 0; > struct page **pages; > unsigned int nr_pages, array_size, i; > gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO; > #ifdef CONFIG_XEN > gfp_t dma_mask = gfp_mask & (__GFP_DMA | __GFP_DMA32); > > BUILD_BUG_ON((__GFP_DMA | __GFP_DMA32) != (__GFP_DMA + __GFP_DMA32)); > if (dma_mask == (__GFP_DMA | __GFP_DMA32)) > gfp_mask &= ~(__GFP_DMA | __GFP_DMA32); > #endif > > nr_pages = (area->size - PAGE_SIZE) >> PAGE_SHIFT; > array_size = (nr_pages * sizeof(struct page *)); > > area->nr_pages = nr_pages; > /* Please note that the recursion is strictly bounded. */ > if (array_size > PAGE_SIZE) { > pages = __vmalloc_node(array_size, 1, nested_gfp|__GFP_HIGHMEM, > PAGE_KERNEL, node, caller); > area->flags |= VM_VPAGES; > } else { > pages = kmalloc_node(array_size, nested_gfp, node); > } > area->pages = pages; > area->caller = caller; > if (!area->pages) { > remove_vm_area(area->addr); > kfree(area); > return NULL; > } > > for (i = 0; i < area->nr_pages; i++) { > struct page *page; > gfp_t tmp_mask = gfp_mask | __GFP_NOWARN; > > if (node < 0) > page = alloc_page(tmp_mask); > else > page = alloc_pages_node(node, tmp_mask, order); > > if (unlikely(!page)) { > /* Successfully allocated i pages, free them in __vunmap() */ > area->nr_pages = i; > goto fail; > } > area->pages[i] = page; > #ifdef CONFIG_XEN > if (dma_mask) { > if (xen_limit_pages_to_max_mfn(page, 0, 32)) { > area->nr_pages = i + 1; > goto fail; > } > if (gfp_mask & __GFP_ZERO) > clear_highpage(page); > } > #endif > } > > if (map_vm_area(area, prot, &pages)) > goto fail; > return area->addr; > > fail: > warn_alloc_failed(gfp_mask, order, > "vmalloc: allocation failure, allocated %ld of %ld bytes\n", > (area->nr_pages*PAGE_SIZE), area->size); > vfree(area->addr); > return NULL; > } > > ... > > #if defined(CONFIG_64BIT) && defined(CONFIG_ZONE_DMA32) > #define GFP_VMALLOC32 GFP_DMA32 | GFP_KERNEL > #elif defined(CONFIG_64BIT) && defined(CONFIG_ZONE_DMA) > #define GFP_VMALLOC32 GFP_DMA | GFP_KERNEL > #elif defined(CONFIG_XEN) > #define GFP_VMALLOC32 __GFP_DMA | __GFP_DMA32 | GFP_KERNEL > #else > #define GFP_VMALLOC32 GFP_KERNEL > #endif
Konrad, I implemented the patch into a 3.1.2 but the patched function doesn''t seem to be called (I set debug=1 for the module). I think it''s only for video capturing devices. But I greped around and found a vmalloc_32 in drivers/media/common/saa7146_core.c line 182 function saa7146_vmalloc_build_pgtable which is included in module saa7146.ko. This would be the DVB chip. Maybe you can rework the patch so that we can just test what you intended to test. Consequently, the patch you did so far doesn''t change the load. Carsten. -----Ursprüngliche Nachricht----- Von: xen-devel-bounces@lists.xensource.com [mailto:xen-devel-bounces@lists.xensource.com] Im Auftrag von Konrad Rzeszutek Wilk Gesendet: Montag, 23. Januar 2012 23:32 An: Konrad Rzeszutek Wilk Cc: Sander Eikelenboom; xen-devel; Jan Beulich Betreff: Re: [Xen-devel] Load increase after memory upgrade (part2) On Wed, Jan 18, 2012 at 10:29:23AM -0400, Konrad Rzeszutek Wilk wrote:> On Wed, Jan 18, 2012 at 11:35:35AM +0000, Jan Beulich wrote: > > >>> On 17.01.12 at 22:02, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote: > > > The issue as I understand is that the DVB drivers allocate their > > > buffers from 0->4GB most (all the time?) so they never have to do bounce-buffering. > > > > > > While the pv-ops one ends up quite frequently doing the > > > bounce-buffering, which implies that the DVB drivers end up > > > allocating their buffers above the 4GB. > > > This means we end up spending some CPU time (in the guest) copying > > > the memory from >4GB to 0-4GB region (And vice-versa). > > > > This reminds me of something (not sure what XenoLinux you use for > > comparison) - how are they allocating that memory? Not vmalloc_32() > > I was using the 2.6.18, then the one I saw on Google for Gentoo, and > now I am going to look at the 2.6.38 from OpenSuSE. > > > by chance (I remember having seen numerous uses under - iirc - > > drivers/media/)? > > > > Obviously, vmalloc_32() and any GFP_DMA32 allocations do *not* do > > what their (driver) callers might expect in a PV guest (including > > the contiguity assumption for the latter, recalling that you earlier > > said you were able to see the problem after several guest starts), > > and I had put into our kernels an adjustment to make vmalloc_32() > > actually behave as expected. > > Aaah.. The plot thickens! Let me look in the sources! Thanks for the > pointer.Jan hints lead me to the videobuf-dma-sg.c which does indeed to vmalloc_32 and then performs PCI DMA operations on the allocted vmalloc_32 area. So I cobbled up the attached patch (hadn''t actually tested it and sadly won''t until next week) which removes the call to vmalloc_32 and instead sets up DMA allocated set of pages. If that fixes it for you that is awesome, but if it breaks please send me your logs. Cheers, Konrad _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
I can now confirm that saa7146_vmalloc_build_pgtable and vmalloc_to_sg are called once per PCI card and will allocate 329 pages. Sorry, but I am not in the position to modify your patch to patch the functions in the right way, but happy to test... BR, Carsten. -----Ursprüngliche Nachricht----- An:Konrad Rzeszutek Wilk <konrad@darnok.org>; CC:Sander Eikelenboom <linux@eikelenboom.it>; xen-devel <xen-devel@lists.xensource.com>; Jan Beulich <JBeulich@suse.com>; Von:Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Gesendet:Mo 23.01.2012 23:42 Betreff:Re: [Xen-devel] Load increase after memory upgrade (part2) Anlage:vmalloc On Wed, Jan 18, 2012 at 10:29:23AM -0400, Konrad Rzeszutek Wilk wrote:> On Wed, Jan 18, 2012 at 11:35:35AM +0000, Jan Beulich wrote: > > >>> On 17.01.12 at 22:02, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote: > > > The issue as I understand is that the DVB drivers allocate their buffers > > > from 0->4GB most (all the time?) so they never have to do bounce-buffering. > > > > > > While the pv-ops one ends up quite frequently doing the bounce-buffering, > > > which > > > implies that the DVB drivers end up allocating their buffers above the 4GB. > > > This means we end up spending some CPU time (in the guest) copying the > > > memory > > > from >4GB to 0-4GB region (And vice-versa). > > > > This reminds me of something (not sure what XenoLinux you use for > > comparison) - how are they allocating that memory? Not vmalloc_32() > > I was using the 2.6.18, then the one I saw on Google for Gentoo, and now > I am going to look at the 2.6.38 from OpenSuSE. > > > by chance (I remember having seen numerous uses under - iirc - > > drivers/media/)? > > > > Obviously, vmalloc_32() and any GFP_DMA32 allocations do *not* do > > what their (driver) callers might expect in a PV guest (including the > > contiguity assumption for the latter, recalling that you earlier said > > you were able to see the problem after several guest starts), and I > > had put into our kernels an adjustment to make vmalloc_32() actually > > behave as expected. > > Aaah.. The plot thickens! Let me look in the sources! Thanks for the > pointer.Jan hints lead me to the videobuf-dma-sg.c which does indeed to vmalloc_32 and then performs PCI DMA operations on the allocted vmalloc_32 area. So I cobbled up the attached patch (hadn''t actually tested it and sadly won''t until next week) which removes the call to vmalloc_32 and instead sets up DMA allocated set of pages. If that fixes it for you that is awesome, but if it breaks please send me your logs. Cheers, Konrad _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Some news: in order to prepare a clean setting, I upgraded to 3.2.1 kernel. I noticed that the load increase is reduced a bit, but noticably. It''s only a simple test, running the DomU for 2 minutes, but the idle load is aprox. - 2.6.32 pvops 12-13% - 3.2.1 pvops 10-11% - 2.6.34 XenoLinux 7-8% BR, Carsten. -----Ursprüngliche Nachricht----- Von: xen-devel-bounces@lists.xensource.com [mailto:xen-devel-bounces@lists.xensource.com] Im Auftrag von Konrad Rzeszutek Wilk Gesendet: Montag, 23. Januar 2012 23:32 An: Konrad Rzeszutek Wilk Cc: Sander Eikelenboom; xen-devel; Jan Beulich Betreff: Re: [Xen-devel] Load increase after memory upgrade (part2) On Wed, Jan 18, 2012 at 10:29:23AM -0400, Konrad Rzeszutek Wilk wrote:> On Wed, Jan 18, 2012 at 11:35:35AM +0000, Jan Beulich wrote: > > >>> On 17.01.12 at 22:02, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote: > > > The issue as I understand is that the DVB drivers allocate their > > > buffers from 0->4GB most (all the time?) so they never have to do bounce-buffering. > > > > > > While the pv-ops one ends up quite frequently doing the > > > bounce-buffering, which implies that the DVB drivers end up > > > allocating their buffers above the 4GB. > > > This means we end up spending some CPU time (in the guest) copying > > > the memory from >4GB to 0-4GB region (And vice-versa). > > > > This reminds me of something (not sure what XenoLinux you use for > > comparison) - how are they allocating that memory? Not vmalloc_32() > > I was using the 2.6.18, then the one I saw on Google for Gentoo, and > now I am going to look at the 2.6.38 from OpenSuSE. > > > by chance (I remember having seen numerous uses under - iirc - > > drivers/media/)? > > > > Obviously, vmalloc_32() and any GFP_DMA32 allocations do *not* do > > what their (driver) callers might expect in a PV guest (including > > the contiguity assumption for the latter, recalling that you earlier > > said you were able to see the problem after several guest starts), > > and I had put into our kernels an adjustment to make vmalloc_32() > > actually behave as expected. > > Aaah.. The plot thickens! Let me look in the sources! Thanks for the > pointer.Jan hints lead me to the videobuf-dma-sg.c which does indeed to vmalloc_32 and then performs PCI DMA operations on the allocted vmalloc_32 area. So I cobbled up the attached patch (hadn''t actually tested it and sadly won''t until next week) which removes the call to vmalloc_32 and instead sets up DMA allocated set of pages. If that fixes it for you that is awesome, but if it breaks please send me your logs. Cheers, Konrad _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2012-Jan-25 21:02 UTC
Re: Load increase after memory upgrade (part2)
On Wed, Jan 25, 2012 at 08:06:12PM +0100, Carsten Schiers wrote:> Some news: in order to prepare a clean setting, I upgraded to 3.2.1 kernel. I noticed that the load increase is > reduced a bit, but noticably. It''s only a simple test, running the DomU for 2 minutes, but the idle load is aprox. > > - 2.6.32 pvops 12-13% > - 3.2.1 pvops 10-11%Yeah. I think this idue to the fix I added in xen-swiotlb to not always do the bounce copying.> - 2.6.34 XenoLinux 7-8% >
Konrad Rzeszutek Wilk
2012-Feb-15 19:28 UTC
Re: Load increase after memory upgrade (part2)
On Wed, Jan 25, 2012 at 08:06:12PM +0100, Carsten Schiers wrote:> Some news: in order to prepare a clean setting, I upgraded to 3.2.1 kernel. I noticed that the load increase is > reduced a bit, but noticably. It''s only a simple test, running the DomU for 2 minutes, but the idle load is aprox. > > - 2.6.32 pvops 12-13% > - 3.2.1 pvops 10-11% > - 2.6.34 XenoLinux 7-8%I took a stab at Jan''s idea - it compiles but I hadn''t been able to properly test it. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>>> On 15.02.12 at 20:28, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote: >@@ -1550,7 +1552,11 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask, > struct page **pages; > unsigned int nr_pages, array_size, i; > gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO; >- >+ gfp_t dma_mask = gfp_mask & (__GFP_DMA | __GFP_DMA32); >+ if (xen_pv_domain()) { >+ if (dma_mask == (__GFP_DMA | __GFP_DMA32))I didn''t spot where you force this normally invalid combination, without which the change won''t affect vmalloc32() in a 32-bit kernel.>+ gfp_mask &= (__GFP_DMA | __GFP_DMA32);gfp_mask &= ~(__GFP_DMA | __GFP_DMA32); Jan>+ } > nr_pages = (area->size - PAGE_SIZE) >> PAGE_SHIFT; > array_size = (nr_pages * sizeof(struct page *)); >
Konrad Rzeszutek Wilk
2012-Feb-17 15:07 UTC
Re: Load increase after memory upgrade (part2)
On Thu, Feb 16, 2012 at 08:56:53AM +0000, Jan Beulich wrote:> >>> On 15.02.12 at 20:28, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote: > >@@ -1550,7 +1552,11 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask, > > struct page **pages; > > unsigned int nr_pages, array_size, i; > > gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO; > >- > >+ gfp_t dma_mask = gfp_mask & (__GFP_DMA | __GFP_DMA32); > >+ if (xen_pv_domain()) { > >+ if (dma_mask == (__GFP_DMA | __GFP_DMA32)) > > I didn''t spot where you force this normally invalid combination, without > which the change won''t affect vmalloc32() in a 32-bit kernel. > > >+ gfp_mask &= (__GFP_DMA | __GFP_DMA32); > > gfp_mask &= ~(__GFP_DMA | __GFP_DMA32); > > JanDuh! Good eyes. Thanks for catching that.> > >+ } > > nr_pages = (area->size - PAGE_SIZE) >> PAGE_SHIFT; > > array_size = (nr_pages * sizeof(struct page *)); > > >
Well let me check for a longer period of time, and especially, whether the DomU is still working (can do that only from at home), but load looks pretty well after applying the patch to 3.2.8 :-D. BR, Carsten. -----Ursprüngliche Nachricht----- An:Jan Beulich <JBeulich@suse.com>; CC:Konrad Rzeszutek Wilk <konrad@darnok.org>; xen-devel <xen-devel@lists.xensource.com>; Carsten Schiers <carsten@schiers.de>; Sander Eikelenboom <linux@eikelenboom.it>; Von:Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Gesendet:Fr 17.02.2012 16:18 Betreff:Re: [Xen-devel] Load increase after memory upgrade (part2) On Thu, Feb 16, 2012 at 08:56:53AM +0000, Jan Beulich wrote:> >>> On 15.02.12 at 20:28, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote: > >@@ -1550,7 +1552,11 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask, > > struct page **pages; > > unsigned int nr_pages, array_size, i; > > gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO; > >- > >+gfp_t dma_mask = gfp_mask & (__GFP_DMA | __GFP_DMA32); > >+if (xen_pv_domain()) { > >+if (dma_mask == (__GFP_DMA | __GFP_DMA32)) > > I didn''t spot where you force this normally invalid combination, without > which the change won''t affect vmalloc32() in a 32-bit kernel. > > >+gfp_mask &= (__GFP_DMA | __GFP_DMA32); > > gfp_mask &= ~(__GFP_DMA | __GFP_DMA32); > > JanDuh! Good eyes. Thanks for catching that.> > >+} > > nr_pages = (area->size - PAGE_SIZE) >> PAGE_SHIFT; > > array_size = (nr_pages * sizeof(struct page *)); > > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Great news: it works and load is back to normal. In the attached graph you can see the peak in blue (compilation of the patched 3.2.8 Kernel) and then after 16.00 the going life of the video DomU. We are below an avaerage of 7% usage (figures are in Permille). Thanks so much. Is that already "the final patch"? BR, Carsten. -----Ursprüngliche Nachricht----- An:Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>; CC:Sander Eikelenboom <linux@eikelenboom.it>; xen-devel <xen-devel@lists.xensource.com>; Jan Beulich <jbeulich@suse.com>; Konrad Rzeszutek Wilk <konrad@darnok.org>; Von:Carsten Schiers <carsten@schiers.de> Gesendet:Di 28.02.2012 15:39 Betreff:Re: [Xen-devel] Load increase after memory upgrade (part2) Anlage:inline.txt Well let me check for a longer period of time, and especially, whether the DomU is still working (can do that only from at home), but load looks pretty well after applying the patch to 3.2.8 :-D. BR, Carsten. -----Ursprüngliche Nachricht----- An:Jan Beulich <JBeulich@suse.com>; CC:Konrad Rzeszutek Wilk <konrad@darnok.org>; xen-devel <xen-devel@lists.xensource.com>; Carsten Schiers <carsten@schiers.de>; Sander Eikelenboom <linux@eikelenboom.it>; Von:Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Gesendet:Fr 17.02.2012 16:18 Betreff:Re: [Xen-devel] Load increase after memory upgrade (part2) On Thu, Feb 16, 2012 at 08:56:53AM +0000, Jan Beulich wrote:> >>> On 15.02.12 at 20:28, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote: > >@@ -1550,7 +1552,11 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask, > > struct page **pages; > > unsigned int nr_pages, array_size, i; > > gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO; > >- > >+gfp_t dma_mask = gfp_mask & (__GFP_DMA | __GFP_DMA32); > >+if (xen_pv_domain()) { > >+if (dma_mask == (__GFP_DMA | __GFP_DMA32)) > > I didn''t spot where you force this normally invalid combination, without > which the change won''t affect vmalloc32() in a 32-bit kernel. > > >+gfp_mask &= (__GFP_DMA | __GFP_DMA32); > > gfp_mask &= ~(__GFP_DMA | __GFP_DMA32); > > JanDuh! Good eyes. Thanks for catching that.> > >+} > > nr_pages = (area->size - PAGE_SIZE) >> PAGE_SHIFT; > > array_size = (nr_pages * sizeof(struct page *)); > > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
I am very sorry. I accidently started the DomU with the wrong config file, thus it''s clear why there is no difference between the two. And unfortunately, the DomU with the correct config file is having a BUG: [ 14.674883] BUG: unable to handle kernel paging request at ffffc7fffffff000 [ 14.674910] IP: [<ffffffff811b4c0b>] swiotlb_bounce+0x2e/0x31 [ 14.674930] PGD 0 [ 14.674940] Oops: 0002 [#1] SMP [ 14.674952] CPU 0 [ 14.674957] Modules linked in: nfsd exportfs nfs lockd fscache auth_rpcgss nfs_acl sunrpc tda10023 budget_av evdev saa7146_vv videodev v4l2_compat_ioctl32 videobuf_dma_sg videobuf_core budget_core snd_pcm dvb_core snd_timer saa7146 snd ttpci_eeprom soundcore snd_page_alloc i2c_core pcspkr ext3 jbd mbcache xen_netfront xen_blkfront [ 14.675057] [ 14.675065] Pid: 0, comm: swapper/0 Not tainted 3.2.8-amd64 #1 [ 14.675079] RIP: e030:[<ffffffff811b4c0b>] [<ffffffff811b4c0b>] swiotlb_bounce+0x2e/0x31 [ 14.675097] RSP: e02b:ffff880013fabe58 EFLAGS: 00010202 [ 14.675106] RAX: ffff880012800000 RBX: 0000000000000001 RCX: 0000000000001000 [ 14.675116] RDX: 0000000000001000 RSI: ffff880012800000 RDI: ffffc7fffffff000 [ 14.675126] RBP: 0000000000000002 R08: ffffc7fffffff000 R09: ffff880013f98000 [ 14.675137] R10: 0000000000000001 R11: ffff880003376000 R12: ffff8800032c5090 [ 14.675147] R13: 0000000000000149 R14: ffff8800033e0000 R15: ffffffff81601fd8 [ 14.675163] FS: 00007f3ff9893700(0000) GS:ffff880013fa8000(0000) knlGS:0000000000000000 [ 14.675175] CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b [ 14.675184] CR2: ffffc7fffffff000 CR3: 0000000012683000 CR4: 0000000000000660 [ 14.675195] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 14.675205] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 14.675216] Process swapper/0 (pid: 0, threadinfo ffffffff81600000, task ffffffff8160d020) [ 14.675227] Stack: [ 14.675232] ffffffff81211826 ffff880002eda000 0000000000000000 ffffc90000408000 [ 14.675251] 00000000000b0150 0000000000000006 ffffffffa013ec4a ffffffff810946cd [ 14.675270] ffffffff81099203 ffff880003376000 0000000000000000 ffff880002eda4b0 [ 14.675289] Call Trace: [ 14.675295] <IRQ> [ 14.675307] [<ffffffff81211826>] ? xen_swiotlb_sync_sg_for_cpu+0x2e/0x47 [ 14.675322] [<ffffffffa013ec4a>] ? vpeirq+0x7f/0x198 [budget_core] [ 14.675337] [<ffffffff810946cd>] ? handle_irq_event_percpu+0x166/0x184 [ 14.675350] [<ffffffff81099203>] ? __rcu_process_callbacks+0x71/0x2f8 [ 14.675364] [<ffffffff8104d175>] ? tasklet_action+0x76/0xc5 [ 14.675376] [<ffffffff8120a9ac>] ? eoi_pirq+0x5b/0x77 [ 14.675388] [<ffffffff8104cbc6>] ? __do_softirq+0xc4/0x1a0 [ 14.675400] [<ffffffff8120a022>] ? __xen_evtchn_do_upcall+0x1c7/0x205 [ 14.675412] [<ffffffff8134b06c>] ? call_softirq+0x1c/0x30 [ 14.675425] [<ffffffff8100fa47>] ? do_softirq+0x3f/0x79 [ 14.675436] [<ffffffff8104c996>] ? irq_exit+0x44/0xb5 [ 14.675452] [<ffffffff8120b032>] ? xen_evtchn_do_upcall+0x27/0x32 [ 14.675464] [<ffffffff8134b0be>] ? xen_do_hypervisor_callback+0x1e/0x30 [ 14.675473] <EOI> Complete log is attached. BR, Carsten. -----Ursprüngliche Nachricht----- An:Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>; CC:Konrad Rzeszutek Wilk <konrad@darnok.org>; xen-devel <xen-devel@lists.xensource.com>; Jan Beulich <jbeulich@suse.com>; Sander Eikelenboom <linux@eikelenboom.it>; Von:Carsten Schiers <carsten@schiers.de> Gesendet:Mi 29.02.2012 13:16 Betreff:Re: [Xen-devel] Load increase after memory upgrade (part2) Anlage:inline.txt Great news: it works and load is back to normal. In the attached graph you can see the peak in blue (compilation of the patched 3.2.8 Kernel) and then after 16.00 the going life of the video DomU. We are below an avaerage of 7% usage (figures are in Permille). Thanks so much. Is that already "the final patch"? BR, Carsten. -----Ursprüngliche Nachricht----- An:Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>; CC:Sander Eikelenboom <linux@eikelenboom.it>; xen-devel <xen-devel@lists.xensource.com>; Jan Beulich <jbeulich@suse.com>; Konrad Rzeszutek Wilk <konrad@darnok.org>; Von:Carsten Schiers <carsten@schiers.de> Gesendet:Di 28.02.2012 15:39 Betreff:Re: [Xen-devel] Load increase after memory upgrade (part2) Anlage:inline.txt Well let me check for a longer period of time, and especially, whether the DomU is still working (can do that only from at home), but load looks pretty well after applying the patch to 3.2.8 :-D. BR, Carsten. -----Ursprüngliche Nachricht----- An:Jan Beulich <JBeulich@suse.com>; CC:Konrad Rzeszutek Wilk <konrad@darnok.org>; xen-devel <xen-devel@lists.xensource.com>; Carsten Schiers <carsten@schiers.de>; Sander Eikelenboom <linux@eikelenboom.it>; Von:Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Gesendet:Fr 17.02.2012 16:18 Betreff:Re: [Xen-devel] Load increase after memory upgrade (part2) On Thu, Feb 16, 2012 at 08:56:53AM +0000, Jan Beulich wrote:> >>> On 15.02.12 at 20:28, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote: > >@@ -1550,7 +1552,11 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask, > > struct page **pages; > > unsigned int nr_pages, array_size, i; > > gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO; > >- > >+gfp_t dma_mask = gfp_mask & (__GFP_DMA | __GFP_DMA32); > >+if (xen_pv_domain()) { > >+if (dma_mask == (__GFP_DMA | __GFP_DMA32)) > > I didn''t spot where you force this normally invalid combination, without > which the change won''t affect vmalloc32() in a 32-bit kernel. > > >+gfp_mask &= (__GFP_DMA | __GFP_DMA32); > > gfp_mask &= ~(__GFP_DMA | __GFP_DMA32); > > JanDuh! Good eyes. Thanks for catching that.> > >+} > > nr_pages = (area->size - PAGE_SIZE) >> PAGE_SHIFT; > > array_size = (nr_pages * sizeof(struct page *)); > > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Hi Konrad, don''t want to be pushy, as I have no real issue. I simply use the Xenified kernel or take the double load. But I think this mistery is still open. My last status was that the latest patch you produced resulted in a BUG, so we still have not checked whether our theory is correct. BR, Carsten. -----Ursprüngliche Nachricht----- Von:Carsten Schiers <carsten@schiers.de> Gesendet:Mi 29.02.2012 14:01 Betreff:Re: [Xen-devel] Load increase after memory upgrade (part2) Anlage:debug.log, inline.txt An:Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>; CC:Sander Eikelenboom <linux@eikelenboom.it>; xen-devel <xen-devel@lists.xensource.com>; Jan Beulich <jbeulich@suse.com>; Konrad Rzeszutek Wilk <konrad@darnok.org>; I am very sorry. I accidently started the DomU with the wrong config file, thus it''s clear why there is no difference between the two. And unfortunately, the DomU with the correct config file is having a BUG: [ 14.674883] BUG: unable to handle kernel paging request at ffffc7fffffff000 [ 14.674910] IP: [<ffffffff811b4c0b>] swiotlb_bounce+0x2e/0x31 [ 14.674930] PGD 0 [ 14.674940] Oops: 0002 [#1] SMP [ 14.674952] CPU 0 [ 14.674957] Modules linked in: nfsd exportfs nfs lockd fscache auth_rpcgss nfs_acl sunrpc tda10023 budget_av evdev saa7146_vv videodev v4l2_compat_ioctl32 videobuf_dma_sg videobuf_core budget_core snd_pcm dvb_core snd_timer saa7146 snd ttpci_eeprom soundcore snd_page_alloc i2c_core pcspkr ext3 jbd mbcache xen_netfront xen_blkfront [ 14.675057] [ 14.675065] Pid: 0, comm: swapper/0 Not tainted 3.2.8-amd64 #1 [ 14.675079] RIP: e030:[<ffffffff811b4c0b>] [<ffffffff811b4c0b>] swiotlb_bounce+0x2e/0x31 [ 14.675097] RSP: e02b:ffff880013fabe58 EFLAGS: 00010202 [ 14.675106] RAX: ffff880012800000 RBX: 0000000000000001 RCX: 0000000000001000 [ 14.675116] RDX: 0000000000001000 RSI: ffff880012800000 RDI: ffffc7fffffff000 [ 14.675126] RBP: 0000000000000002 R08: ffffc7fffffff000 R09: ffff880013f98000 [ 14.675137] R10: 0000000000000001 R11: ffff880003376000 R12: ffff8800032c5090 [ 14.675147] R13: 0000000000000149 R14: ffff8800033e0000 R15: ffffffff81601fd8 [ 14.675163] FS: 00007f3ff9893700(0000) GS:ffff880013fa8000(0000) knlGS:0000000000000000 [ 14.675175] CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b [ 14.675184] CR2: ffffc7fffffff000 CR3: 0000000012683000 CR4: 0000000000000660 [ 14.675195] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 14.675205] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 14.675216] Process swapper/0 (pid: 0, threadinfo ffffffff81600000, task ffffffff8160d020) [ 14.675227] Stack: [ 14.675232] ffffffff81211826 ffff880002eda000 0000000000000000 ffffc90000408000 [ 14.675251] 00000000000b0150 0000000000000006 ffffffffa013ec4a ffffffff810946cd [ 14.675270] ffffffff81099203 ffff880003376000 0000000000000000 ffff880002eda4b0 [ 14.675289] Call Trace: [ 14.675295] <IRQ> [ 14.675307] [<ffffffff81211826>] ? xen_swiotlb_sync_sg_for_cpu+0x2e/0x47 [ 14.675322] [<ffffffffa013ec4a>] ? vpeirq+0x7f/0x198 [budget_core] [ 14.675337] [<ffffffff810946cd>] ? handle_irq_event_percpu+0x166/0x184 [ 14.675350] [<ffffffff81099203>] ? __rcu_process_callbacks+0x71/0x2f8 [ 14.675364] [<ffffffff8104d175>] ? tasklet_action+0x76/0xc5 [ 14.675376] [<ffffffff8120a9ac>] ? eoi_pirq+0x5b/0x77 [ 14.675388] [<ffffffff8104cbc6>] ? __do_softirq+0xc4/0x1a0 [ 14.675400] [<ffffffff8120a022>] ? __xen_evtchn_do_upcall+0x1c7/0x205 [ 14.675412] [<ffffffff8134b06c>] ? call_softirq+0x1c/0x30 [ 14.675425] [<ffffffff8100fa47>] ? do_softirq+0x3f/0x79 [ 14.675436] [<ffffffff8104c996>] ? irq_exit+0x44/0xb5 [ 14.675452] [<ffffffff8120b032>] ? xen_evtchn_do_upcall+0x27/0x32 [ 14.675464] [<ffffffff8134b0be>] ? xen_do_hypervisor_callback+0x1e/0x30 [ 14.675473] <EOI> Complete log is attached. BR, Carsten. -----Ursprüngliche Nachricht----- An:Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>; CC:Konrad Rzeszutek Wilk <konrad@darnok.org>; xen-devel <xen-devel@lists.xensource.com>; Jan Beulich <jbeulich@suse.com>; Sander Eikelenboom <linux@eikelenboom.it>; Von:Carsten Schiers <carsten@schiers.de> Gesendet:Mi 29.02.2012 13:16 Betreff:Re: [Xen-devel] Load increase after memory upgrade (part2) Anlage:inline.txt Great news: it works and load is back to normal. In the attached graph you can see the peak in blue (compilation of the patched 3.2.8 Kernel) and then after 16.00 the going life of the video DomU. We are below an avaerage of 7% usage (figures are in Permille). Thanks so much. Is that already "the final patch"? BR, Carsten. -----Ursprüngliche Nachricht----- An:Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>; CC:Sander Eikelenboom <linux@eikelenboom.it>; xen-devel <xen-devel@lists.xensource.com>; Jan Beulich <jbeulich@suse.com>; Konrad Rzeszutek Wilk <konrad@darnok.org>; Von:Carsten Schiers <carsten@schiers.de> Gesendet:Di 28.02.2012 15:39 Betreff:Re: [Xen-devel] Load increase after memory upgrade (part2) Anlage:inline.txt Well let me check for a longer period of time, and especially, whether the DomU is still working (can do that only from at home), but load looks pretty well after applying the patch to 3.2.8 :-D. BR, Carsten. -----Ursprüngliche Nachricht----- An:Jan Beulich <JBeulich@suse.com>; CC:Konrad Rzeszutek Wilk <konrad@darnok.org>; xen-devel <xen-devel@lists.xensource.com>; Carsten Schiers <carsten@schiers.de>; Sander Eikelenboom <linux@eikelenboom.it>; Von:Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Gesendet:Fr 17.02.2012 16:18 Betreff:Re: [Xen-devel] Load increase after memory upgrade (part2) On Thu, Feb 16, 2012 at 08:56:53AM +0000, Jan Beulich wrote:> >>> On 15.02.12 at 20:28, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote: > >@@ -1550,7 +1552,11 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask, > > struct page **pages; > > unsigned int nr_pages, array_size, i; > > gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO; > >- > >+gfp_t dma_mask = gfp_mask & (__GFP_DMA | __GFP_DMA32); > >+if (xen_pv_domain()) { > >+if (dma_mask == (__GFP_DMA | __GFP_DMA32)) > > I didn''t spot where you force this normally invalid combination, without > which the change won''t affect vmalloc32() in a 32-bit kernel. > > >+gfp_mask &= (__GFP_DMA | __GFP_DMA32); > > gfp_mask &= ~(__GFP_DMA | __GFP_DMA32); > > JanDuh! Good eyes. Thanks for catching that.> > >+} > > nr_pages = (area->size - PAGE_SIZE) >> PAGE_SHIFT; > > array_size = (nr_pages * sizeof(struct page *)); > > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel -------------------------------- E-Mail ist virenfrei. Von AVG überprüft - www.avg.de Version: 2012.0.2127 / Virendatenbank: 2411/4932 - Ausgabedatum: 12.04.2012 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Konrad Rzeszutek Wilk
2012-May-11 19:41 UTC
Re: Load increase after memory upgrade (part2)
On Fri, May 11, 2012 at 11:39:08AM +0200, Carsten Schiers wrote:> Hi Konrad, > > > don''t want to be pushy, as I have no real issue. I simply use the Xenified kernel or take the double load. > > But I think this mistery is still open. My last status was that the latest patch you produced resulted in a BUG,Yes, that is right. Thank you for reminding me.> > so we still have not checked whether our theory is correct.No we haven''t. And I should be have no trouble reproducing this. I can just write a tiny module that allocates vmalloc_32(). But your timming sucks - I am going on a week vacation next week :-( Ah, if there was just a cloning machine - I could stick myself in it, and Baseline_0 goes on vacation, while Clone_1 goes on working. Then git merge Baseline_0 and Clone_1 in a week and fixup the merge conflicts and continue on. Sigh. Can I ask you to be patient with me once more and ping me in a week - when I am back from vacation and my brain is fresh to work on this?
Konrad Rzeszutek Wilk
2012-Jun-13 16:55 UTC
Re: Load increase after memory upgrade (part2)
On Fri, May 11, 2012 at 03:41:38PM -0400, Konrad Rzeszutek Wilk wrote:> On Fri, May 11, 2012 at 11:39:08AM +0200, Carsten Schiers wrote: > > Hi Konrad, > > > > > > don''t want to be pushy, as I have no real issue. I simply use the Xenified kernel or take the double load. > > > > But I think this mistery is still open. My last status was that the latest patch you produced resulted in a BUG, > > Yes, that is right. Thank you for reminding me. > > > > so we still have not checked whether our theory is correct. > > No we haven''t. And I should be have no trouble reproducing this. I can just write > a tiny module that allocates vmalloc_32().Done. Found some bugs.. and here is anew version. Can you please try it out? It has the #define DEBUG 1 set so it should print a lot of stuff when the DVB module loads. If it crashes please send me the full log. Thanks. From 5afb4ab1fb3d2b059fe1a6db93ab65cb76f43b8a Mon Sep 17 00:00:00 2001 From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Date: Thu, 31 May 2012 14:21:04 -0400 Subject: [PATCH] xen/vmalloc_32: Use xen_exchange_.. when GFP flags are DMA. [v3] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> --- arch/x86/xen/mmu.c | 187 +++++++++++++++++++++++++++++++++++++++++++++++- include/xen/xen-ops.h | 2 + mm/vmalloc.c | 18 +++++- 3 files changed, 202 insertions(+), 5 deletions(-) diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c index 3a73785..960d206 100644 --- a/arch/x86/xen/mmu.c +++ b/arch/x86/xen/mmu.c @@ -47,6 +47,7 @@ #include <linux/gfp.h> #include <linux/memblock.h> #include <linux/seq_file.h> +#include <linux/slab.h> #include <trace/events/xen.h> @@ -2051,6 +2052,7 @@ void __init xen_init_mmu_ops(void) /* Protected by xen_reservation_lock. */ #define MAX_CONTIG_ORDER 9 /* 2MB */ static unsigned long discontig_frames[1<<MAX_CONTIG_ORDER]; +static unsigned long limited_frames[1<<MAX_CONTIG_ORDER]; #define VOID_PTE (mfn_pte(0, __pgprot(0))) static void xen_zap_pfn_range(unsigned long vaddr, unsigned int order, @@ -2075,6 +2077,42 @@ static void xen_zap_pfn_range(unsigned long vaddr, unsigned int order, } xen_mc_issue(0); } +static int xen_zap_page_range(struct page *pages, unsigned int order, + unsigned long *in_frames, + unsigned long *out_frames, + void *limit_bitmap) +{ + int i, n = 0; + struct multicall_space mcs; + struct page *page; + + xen_mc_batch(); + for (i = 0; i < (1UL<<order); i++) { + if (!test_bit(i, limit_bitmap)) + continue; + + page = &pages[i]; + mcs = __xen_mc_entry(0); +#define DEBUG 1 + if (in_frames) { +#ifdef DEBUG + printk(KERN_INFO "%s:%d 0x%lx(pfn) 0x%lx (mfn) 0x%lx(vaddr)\n", + __func__, i, page_to_pfn(page), + pfn_to_mfn(page_to_pfn(page)), page_address(page)); +#endif + in_frames[i] = pfn_to_mfn(page_to_pfn(page)); + } + MULTI_update_va_mapping(mcs.mc, (unsigned long)page_address(page), VOID_PTE, 0); + set_phys_to_machine(page_to_pfn(page), INVALID_P2M_ENTRY); + + if (out_frames) + out_frames[i] = page_to_pfn(page); + ++n; + + } + xen_mc_issue(0); + return n; +} /* * Update the pfn-to-mfn mappings for a virtual address range, either to @@ -2118,6 +2156,53 @@ static void xen_remap_exchanged_ptes(unsigned long vaddr, int order, xen_mc_issue(0); } +static void xen_remap_exchanged_pages(struct page *pages, int order, + unsigned long *mfns, + unsigned long first_mfn, /* in_frame if we failed*/ + void *limit_map) +{ + unsigned i, limit; + unsigned long mfn; + struct page *page; + + xen_mc_batch(); + + limit = 1ULL << order; + for (i = 0; i < limit; i++) { + struct multicall_space mcs; + unsigned flags; + + if (!test_bit(i, limit_map)) + continue; + + page = &pages[i]; + mcs = __xen_mc_entry(0); + if (mfns) + mfn = mfns[i]; + else + mfn = first_mfn + i; + + if (i < (limit - 1)) + flags = 0; + else { + if (order == 0) + flags = UVMF_INVLPG | UVMF_ALL; + else + flags = UVMF_TLB_FLUSH | UVMF_ALL; + } +#ifdef DEBUG + printk(KERN_INFO "%s (%d) pfn:0x%lx, pfn: 0x%lx vaddr: 0x%lx\n", + __func__, i, page_to_pfn(page), mfn, page_address(page)); +#endif + MULTI_update_va_mapping(mcs.mc, (unsigned long)page_address(page), + mfn_pte(mfn, PAGE_KERNEL), flags); + + set_phys_to_machine(page_to_pfn(page), mfn); + } + + xen_mc_issue(0); +} + /* * Perform the hypercall to exchange a region of our pfns to point to @@ -2136,7 +2221,9 @@ static int xen_exchange_memory(unsigned long extents_in, unsigned int order_in, { long rc; int success; - +#ifdef DEBUG + int i; +#endif struct xen_memory_exchange exchange = { .in = { .nr_extents = extents_in, @@ -2157,7 +2244,11 @@ static int xen_exchange_memory(unsigned long extents_in, unsigned int order_in, rc = HYPERVISOR_memory_op(XENMEM_exchange, &exchange); success = (exchange.nr_exchanged == extents_in); - +#ifdef DEBUG + for (i = 0; i < exchange.nr_exchanged; i++) { + printk(KERN_INFO "%s 0x%lx (mfn) <-> 0x%lx (mfn)\n", __func__,pfns_in[i], mfns_out[i]); + } +#endif BUG_ON(!success && ((exchange.nr_exchanged != 0) || (rc == 0))); BUG_ON(success && (rc != 0)); @@ -2231,8 +2322,8 @@ void xen_destroy_contiguous_region(unsigned long vstart, unsigned int order) xen_zap_pfn_range(vstart, order, NULL, out_frames); /* 3. Do the exchange for non-contiguous MFNs. */ - success = xen_exchange_memory(1, order, &in_frame, 1UL << order, - 0, out_frames, 0); + success = xen_exchange_memory(1, order, &in_frame, + 1UL << order, 0, out_frames, 0); /* 4. Map new pages in place of old pages. */ if (success) @@ -2244,6 +2335,94 @@ void xen_destroy_contiguous_region(unsigned long vstart, unsigned int order) } EXPORT_SYMBOL_GPL(xen_destroy_contiguous_region); +int xen_limit_pages_to_max_mfn(struct page *pages, unsigned int order, + unsigned int address_bits) +{ + unsigned long *in_frames = discontig_frames, *out_frames = limited_frames; + unsigned long flags; + struct page *page; + int success; + int i, n = 0; + unsigned long _limit_map; + unsigned long *limit_map; + + if (xen_feature(XENFEAT_auto_translated_physmap)) + return 0; + + if (unlikely(order > MAX_CONTIG_ORDER)) + return -ENOMEM; + + if (BITS_PER_LONG >> order) { + limit_map = kzalloc(BITS_TO_LONGS(1U << order) * + sizeof(*limit_map), GFP_KERNEL); + if (unlikely(!limit_map)) + return -ENOMEM; + } else + limit_map = &_limit_map; + + /* 0. Construct our per page bitmap lookup. */ + + if (address_bits && (address_bits < PAGE_SHIFT)) + return -EINVAL; + + if (order) + bitmap_zero(limit_map, 1U << order); + else + __set_bit(0, limit_map); + + /* 1. Clear the pages */ + for (i = 0; i < (1ULL << order); i++) { + void *vaddr; + page = &pages[i]; + + vaddr = page_address(page); +#ifdef DEBUG + printk(KERN_INFO "%s: page: %p vaddr: %p 0x%lx(mfn) 0x%lx(pfn)\n", __func__, page, vaddr, virt_to_mfn(vaddr), mfn_to_pfn(virt_to_mfn(vaddr))); +#endif + if (address_bits) { + if (!(virt_to_mfn(vaddr) >> (address_bits - PAGE_SHIFT))) + continue; + __set_bit(i, limit_map); + } + if (!PageHighMem(page)) + memset(vaddr, 0, PAGE_SIZE); + else { + memset(kmap(page), 0, PAGE_SIZE); + kunmap(page); + ++n; + } + } + /* Check to see if we actually have to do any work. */ + if (bitmap_empty(limit_map, 1U << order)) { + if (limit_map != &_limit_map) + kfree(limit_map); + return 0; + } + if (n) + kmap_flush_unused(); + + spin_lock_irqsave(&xen_reservation_lock, flags); + + /* 2. Zap current PTEs. */ + n = xen_zap_page_range(pages, order, in_frames, NULL /*out_frames */, limit_map); + + /* 3. Do the exchange for non-contiguous MFNs. */ + success = xen_exchange_memory(n, 0 /* this is always called per page */, in_frames, + n, 0, out_frames, address_bits); + + /* 4. Map new pages in place of old pages. */ + if (success) + xen_remap_exchanged_pages(pages, order, out_frames, 0, limit_map); + else + xen_remap_exchanged_pages(pages, order, NULL, *in_frames, limit_map); + + spin_unlock_irqrestore(&xen_reservation_lock, flags); + if (limit_map != &_limit_map) + kfree(limit_map); + + return success ? 0 : -ENOMEM; +} +EXPORT_SYMBOL_GPL(xen_limit_pages_to_max_mfn); #ifdef CONFIG_XEN_PVHVM static void xen_hvm_exit_mmap(struct mm_struct *mm) { diff --git a/include/xen/xen-ops.h b/include/xen/xen-ops.h index 6a198e4..2f8709f 100644 --- a/include/xen/xen-ops.h +++ b/include/xen/xen-ops.h @@ -29,4 +29,6 @@ int xen_remap_domain_mfn_range(struct vm_area_struct *vma, unsigned long mfn, int nr, pgprot_t prot, unsigned domid); +int xen_limit_pages_to_max_mfn(struct page *pages, unsigned int order, + unsigned int address_bits); #endif /* INCLUDE_XEN_OPS_H */ diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 2aad499..194af07 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -31,6 +31,8 @@ #include <asm/tlbflush.h> #include <asm/shmparam.h> +#include <xen/xen.h> +#include <xen/xen-ops.h> /*** Page table manipulation functions ***/ static void vunmap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end) @@ -1576,7 +1578,11 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask, struct page **pages; unsigned int nr_pages, array_size, i; gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO; - + gfp_t dma_mask = gfp_mask & (__GFP_DMA | __GFP_DMA32); + if (xen_pv_domain()) { + if (dma_mask == (__GFP_DMA | __GFP_DMA32)) + gfp_mask &= ~(__GFP_DMA | __GFP_DMA32); + } nr_pages = (area->size - PAGE_SIZE) >> PAGE_SHIFT; array_size = (nr_pages * sizeof(struct page *)); @@ -1612,6 +1618,16 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask, goto fail; } area->pages[i] = page; + if (xen_pv_domain()) { + if (dma_mask) { + if (xen_limit_pages_to_max_mfn(page, 0, 32)) { + area->nr_pages = i + 1; + goto fail; + } + if (gfp_mask & __GFP_ZERO) + clear_highpage(page); + } + } } if (map_vm_area(area, prot, &pages)) -- 1.7.7.6
>>> On 13.06.12 at 18:55, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote: > @@ -1576,7 +1578,11 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask, > struct page **pages; > unsigned int nr_pages, array_size, i; > gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO; > - > + gfp_t dma_mask = gfp_mask & (__GFP_DMA | __GFP_DMA32); > + if (xen_pv_domain()) { > + if (dma_mask == (__GFP_DMA | __GFP_DMA32))As said in an earlier reply - without having any place that would ever set both flags at once, this whole conditional is meaningless. In our code - which I suppose is where you cloned this from - we set GFP_VMALLOC32 to such a value for 32-bit kernels (which otherwise would merely use GFP_KERNEL, and hence not trigger the code calling xen_limit_pages_to_max_mfn()). I don''t recall though whether Carsten''s problem was on a 32- or 64-bit kernel. Jan> + gfp_mask &= ~(__GFP_DMA | __GFP_DMA32); > + } > nr_pages = (area->size - PAGE_SIZE) >> PAGE_SHIFT; > array_size = (nr_pages * sizeof(struct page *)); >
On 13/06/12 17:55, Konrad Rzeszutek Wilk wrote:> > + /* 3. Do the exchange for non-contiguous MFNs. */ > + success = xen_exchange_memory(n, 0 /* this is always called per page */, in_frames, > + n, 0, out_frames, address_bits);vmalloc() does not require physically contiguous MFNs. David
Konrad Rzeszutek Wilk
2012-Jun-14 18:31 UTC
Re: Load increase after memory upgrade (part2)
On Thu, Jun 14, 2012 at 09:38:31AM +0100, David Vrabel wrote:> On 13/06/12 17:55, Konrad Rzeszutek Wilk wrote: > > > > + /* 3. Do the exchange for non-contiguous MFNs. */ > > + success = xen_exchange_memory(n, 0 /* this is always called per page */, in_frames, > > + n, 0, out_frames, address_bits); > > vmalloc() does not require physically contiguous MFNs.<nods> It doesn''t matter that much in this context as the vmalloc calls this per-page - so it is only one page that is swizzled.> > David
Konrad Rzeszutek Wilk
2012-Jun-14 18:33 UTC
Re: Load increase after memory upgrade (part2)
On Thu, Jun 14, 2012 at 08:07:55AM +0100, Jan Beulich wrote:> >>> On 13.06.12 at 18:55, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote: > > @@ -1576,7 +1578,11 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask, > > struct page **pages; > > unsigned int nr_pages, array_size, i; > > gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO; > > - > > + gfp_t dma_mask = gfp_mask & (__GFP_DMA | __GFP_DMA32); > > + if (xen_pv_domain()) { > > + if (dma_mask == (__GFP_DMA | __GFP_DMA32)) > > As said in an earlier reply - without having any place that would > ever set both flags at once, this whole conditional is meaningless. > In our code - which I suppose is where you cloned this from - weYup.> set GFP_VMALLOC32 to such a value for 32-bit kernels (which > otherwise would merely use GFP_KERNEL, and hence not triggerAh, let me double check. Thanks for looking out for this.> the code calling xen_limit_pages_to_max_mfn()). I don''t recall > though whether Carsten''s problem was on a 32- or 64-bit kernel. > > Jan > > > + gfp_mask &= ~(__GFP_DMA | __GFP_DMA32); > > + } > > nr_pages = (area->size - PAGE_SIZE) >> PAGE_SHIFT; > > array_size = (nr_pages * sizeof(struct page *)); > > > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel
Konrad, against which kernel version did you produce this patch? It will not succeed with 3.4.2 at least, will look up some older version now... -----Ursprüngliche Nachricht----- Von: xen-devel-bounces@lists.xen.org [mailto:xen-devel-bounces@lists.xen.org] Im Auftrag von Konrad Rzeszutek Wilk Gesendet: Mittwoch, 13. Juni 2012 18:55 An: Carsten Schiers Cc: Konrad Rzeszutek Wilk; xen-devel; Jan Beulich; Sander Eikelenboom Betreff: Re: [Xen-devel] Load increase after memory upgrade (part2) On Fri, May 11, 2012 at 03:41:38PM -0400, Konrad Rzeszutek Wilk wrote:> On Fri, May 11, 2012 at 11:39:08AM +0200, Carsten Schiers wrote: > > Hi Konrad, > > > > > > don''t want to be pushy, as I have no real issue. I simply use the Xenified kernel or take the double load. > > > > But I think this mistery is still open. My last status was that the > > latest patch you produced resulted in a BUG, > > Yes, that is right. Thank you for reminding me. > > > > so we still have not checked whether our theory is correct. > > No we haven''t. And I should be have no trouble reproducing this. I can > just write a tiny module that allocates vmalloc_32().Done. Found some bugs.. and here is anew version. Can you please try it out? It has the #define DEBUG 1 set so it should print a lot of stuff when the DVB module loads. If it crashes please send me the full log. Thanks. From 5afb4ab1fb3d2b059fe1a6db93ab65cb76f43b8a Mon Sep 17 00:00:00 2001 From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Date: Thu, 31 May 2012 14:21:04 -0400 Subject: [PATCH] xen/vmalloc_32: Use xen_exchange_.. when GFP flags are DMA. [v3] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> --- arch/x86/xen/mmu.c | 187 +++++++++++++++++++++++++++++++++++++++++++++++- include/xen/xen-ops.h | 2 + mm/vmalloc.c | 18 +++++- 3 files changed, 202 insertions(+), 5 deletions(-) diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c index 3a73785..960d206 100644 --- a/arch/x86/xen/mmu.c +++ b/arch/x86/xen/mmu.c @@ -47,6 +47,7 @@ #include <linux/gfp.h> #include <linux/memblock.h> #include <linux/seq_file.h> +#include <linux/slab.h> #include <trace/events/xen.h> @@ -2051,6 +2052,7 @@ void __init xen_init_mmu_ops(void) /* Protected by xen_reservation_lock. */ #define MAX_CONTIG_ORDER 9 /* 2MB */ static unsigned long discontig_frames[1<<MAX_CONTIG_ORDER]; +static unsigned long limited_frames[1<<MAX_CONTIG_ORDER]; #define VOID_PTE (mfn_pte(0, __pgprot(0))) static void xen_zap_pfn_range(unsigned long vaddr, unsigned int order, @@ -2075,6 +2077,42 @@ static void xen_zap_pfn_range(unsigned long vaddr, unsigned int order, } xen_mc_issue(0); } +static int xen_zap_page_range(struct page *pages, unsigned int order, + unsigned long *in_frames, + unsigned long *out_frames, + void *limit_bitmap) +{ + int i, n = 0; + struct multicall_space mcs; + struct page *page; + + xen_mc_batch(); + for (i = 0; i < (1UL<<order); i++) { + if (!test_bit(i, limit_bitmap)) + continue; + + page = &pages[i]; + mcs = __xen_mc_entry(0); +#define DEBUG 1 + if (in_frames) { +#ifdef DEBUG + printk(KERN_INFO "%s:%d 0x%lx(pfn) 0x%lx (mfn) 0x%lx(vaddr)\n", + __func__, i, page_to_pfn(page), + pfn_to_mfn(page_to_pfn(page)), page_address(page)); #endif + in_frames[i] = pfn_to_mfn(page_to_pfn(page)); + } + MULTI_update_va_mapping(mcs.mc, (unsigned long)page_address(page), VOID_PTE, 0); + set_phys_to_machine(page_to_pfn(page), INVALID_P2M_ENTRY); + + if (out_frames) + out_frames[i] = page_to_pfn(page); + ++n; + + } + xen_mc_issue(0); + return n; +} /* * Update the pfn-to-mfn mappings for a virtual address range, either to @@ -2118,6 +2156,53 @@ static void xen_remap_exchanged_ptes(unsigned long vaddr, int order, xen_mc_issue(0); } +static void xen_remap_exchanged_pages(struct page *pages, int order, + unsigned long *mfns, + unsigned long first_mfn, /* in_frame if we failed*/ + void *limit_map) +{ + unsigned i, limit; + unsigned long mfn; + struct page *page; + + xen_mc_batch(); + + limit = 1ULL << order; + for (i = 0; i < limit; i++) { + struct multicall_space mcs; + unsigned flags; + + if (!test_bit(i, limit_map)) + continue; + + page = &pages[i]; + mcs = __xen_mc_entry(0); + if (mfns) + mfn = mfns[i]; + else + mfn = first_mfn + i; + + if (i < (limit - 1)) + flags = 0; + else { + if (order == 0) + flags = UVMF_INVLPG | UVMF_ALL; + else + flags = UVMF_TLB_FLUSH | UVMF_ALL; + } +#ifdef DEBUG + printk(KERN_INFO "%s (%d) pfn:0x%lx, pfn: 0x%lx vaddr: 0x%lx\n", + __func__, i, page_to_pfn(page), mfn, page_address(page)); #endif + MULTI_update_va_mapping(mcs.mc, (unsigned long)page_address(page), + mfn_pte(mfn, PAGE_KERNEL), flags); + + set_phys_to_machine(page_to_pfn(page), mfn); + } + + xen_mc_issue(0); +} + /* * Perform the hypercall to exchange a region of our pfns to point to @@ -2136,7 +2221,9 @@ static int xen_exchange_memory(unsigned long extents_in, unsigned int order_in, { long rc; int success; - +#ifdef DEBUG + int i; +#endif struct xen_memory_exchange exchange = { .in = { .nr_extents = extents_in, @@ -2157,7 +2244,11 @@ static int xen_exchange_memory(unsigned long extents_in, unsigned int order_in, rc = HYPERVISOR_memory_op(XENMEM_exchange, &exchange); success = (exchange.nr_exchanged == extents_in); - +#ifdef DEBUG + for (i = 0; i < exchange.nr_exchanged; i++) { + printk(KERN_INFO "%s 0x%lx (mfn) <-> 0x%lx (mfn)\n", __func__,pfns_in[i], mfns_out[i]); + } +#endif BUG_ON(!success && ((exchange.nr_exchanged != 0) || (rc == 0))); BUG_ON(success && (rc != 0)); @@ -2231,8 +2322,8 @@ void xen_destroy_contiguous_region(unsigned long vstart, unsigned int order) xen_zap_pfn_range(vstart, order, NULL, out_frames); /* 3. Do the exchange for non-contiguous MFNs. */ - success = xen_exchange_memory(1, order, &in_frame, 1UL << order, - 0, out_frames, 0); + success = xen_exchange_memory(1, order, &in_frame, + 1UL << order, 0, out_frames, 0); /* 4. Map new pages in place of old pages. */ if (success) @@ -2244,6 +2335,94 @@ void xen_destroy_contiguous_region(unsigned long vstart, unsigned int order) } EXPORT_SYMBOL_GPL(xen_destroy_contiguous_region); +int xen_limit_pages_to_max_mfn(struct page *pages, unsigned int order, + unsigned int address_bits) +{ + unsigned long *in_frames = discontig_frames, *out_frames = limited_frames; + unsigned long flags; + struct page *page; + int success; + int i, n = 0; + unsigned long _limit_map; + unsigned long *limit_map; + + if (xen_feature(XENFEAT_auto_translated_physmap)) + return 0; + + if (unlikely(order > MAX_CONTIG_ORDER)) + return -ENOMEM; + + if (BITS_PER_LONG >> order) { + limit_map = kzalloc(BITS_TO_LONGS(1U << order) * + sizeof(*limit_map), GFP_KERNEL); + if (unlikely(!limit_map)) + return -ENOMEM; + } else + limit_map = &_limit_map; + + /* 0. Construct our per page bitmap lookup. */ + + if (address_bits && (address_bits < PAGE_SHIFT)) + return -EINVAL; + + if (order) + bitmap_zero(limit_map, 1U << order); + else + __set_bit(0, limit_map); + + /* 1. Clear the pages */ + for (i = 0; i < (1ULL << order); i++) { + void *vaddr; + page = &pages[i]; + + vaddr = page_address(page); +#ifdef DEBUG + printk(KERN_INFO "%s: page: %p vaddr: %p 0x%lx(mfn) 0x%lx(pfn)\n", +__func__, page, vaddr, virt_to_mfn(vaddr), mfn_to_pfn(virt_to_mfn(vaddr))); #endif + if (address_bits) { + if (!(virt_to_mfn(vaddr) >> (address_bits - PAGE_SHIFT))) + continue; + __set_bit(i, limit_map); + } + if (!PageHighMem(page)) + memset(vaddr, 0, PAGE_SIZE); + else { + memset(kmap(page), 0, PAGE_SIZE); + kunmap(page); + ++n; + } + } + /* Check to see if we actually have to do any work. */ + if (bitmap_empty(limit_map, 1U << order)) { + if (limit_map != &_limit_map) + kfree(limit_map); + return 0; + } + if (n) + kmap_flush_unused(); + + spin_lock_irqsave(&xen_reservation_lock, flags); + + /* 2. Zap current PTEs. */ + n = xen_zap_page_range(pages, order, in_frames, NULL /*out_frames */, +limit_map); + + /* 3. Do the exchange for non-contiguous MFNs. */ + success = xen_exchange_memory(n, 0 /* this is always called per page */, in_frames, + n, 0, out_frames, address_bits); + + /* 4. Map new pages in place of old pages. */ + if (success) + xen_remap_exchanged_pages(pages, order, out_frames, 0, limit_map); + else + xen_remap_exchanged_pages(pages, order, NULL, *in_frames, limit_map); + + spin_unlock_irqrestore(&xen_reservation_lock, flags); + if (limit_map != &_limit_map) + kfree(limit_map); + + return success ? 0 : -ENOMEM; +} +EXPORT_SYMBOL_GPL(xen_limit_pages_to_max_mfn); #ifdef CONFIG_XEN_PVHVM static void xen_hvm_exit_mmap(struct mm_struct *mm) { diff --git a/include/xen/xen-ops.h b/include/xen/xen-ops.h index 6a198e4..2f8709f 100644 --- a/include/xen/xen-ops.h +++ b/include/xen/xen-ops.h @@ -29,4 +29,6 @@ int xen_remap_domain_mfn_range(struct vm_area_struct *vma, unsigned long mfn, int nr, pgprot_t prot, unsigned domid); +int xen_limit_pages_to_max_mfn(struct page *pages, unsigned int order, + unsigned int address_bits); #endif /* INCLUDE_XEN_OPS_H */ diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 2aad499..194af07 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -31,6 +31,8 @@ #include <asm/tlbflush.h> #include <asm/shmparam.h> +#include <xen/xen.h> +#include <xen/xen-ops.h> /*** Page table manipulation functions ***/ static void vunmap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end) @@ -1576,7 +1578,11 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask, struct page **pages; unsigned int nr_pages, array_size, i; gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO; - + gfp_t dma_mask = gfp_mask & (__GFP_DMA | __GFP_DMA32); + if (xen_pv_domain()) { + if (dma_mask == (__GFP_DMA | __GFP_DMA32)) + gfp_mask &= ~(__GFP_DMA | __GFP_DMA32); + } nr_pages = (area->size - PAGE_SIZE) >> PAGE_SHIFT; array_size = (nr_pages * sizeof(struct page *)); @@ -1612,6 +1618,16 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask, goto fail; } area->pages[i] = page; + if (xen_pv_domain()) { + if (dma_mask) { + if (xen_limit_pages_to_max_mfn(page, 0, 32)) { + area->nr_pages = i + 1; + goto fail; + } + if (gfp_mask & __GFP_ZERO) + clear_highpage(page); + } + } } if (map_vm_area(area, prot, &pages)) -- 1.7.7.6 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ----- E-Mail ist virenfrei. Von AVG überprüft - www.avg.de Version: 2012.0.2180 / Virendatenbank: 2433/5067 - Ausgabedatum: 13.06.2012
It''s a 64 Bit kernel... -----Ursprüngliche Nachricht----- Von: Jan Beulich [mailto:JBeulich@suse.com] Gesendet: Donnerstag, 14. Juni 2012 09:08 An: Konrad Rzeszutek Wilk Cc: Konrad Rzeszutek Wilk; Sander Eikelenboom; xen-devel; Carsten Schiers Betreff: Re: [Xen-devel] Load increase after memory upgrade (part2)>>> On 13.06.12 at 18:55, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote: > @@ -1576,7 +1578,11 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask, > struct page **pages; > unsigned int nr_pages, array_size, i; > gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO; > - > + gfp_t dma_mask = gfp_mask & (__GFP_DMA | __GFP_DMA32); > + if (xen_pv_domain()) { > + if (dma_mask == (__GFP_DMA | __GFP_DMA32))As said in an earlier reply - without having any place that would ever set both flags at once, this whole conditional is meaningless. In our code - which I suppose is where you cloned this from - we set GFP_VMALLOC32 to such a value for 32-bit kernels (which otherwise would merely use GFP_KERNEL, and hence not trigger the code calling xen_limit_pages_to_max_mfn()). I don''t recall though whether Carsten''s problem was on a 32- or 64-bit kernel. Jan> + gfp_mask &= ~(__GFP_DMA | __GFP_DMA32); > + } > nr_pages = (area->size - PAGE_SIZE) >> PAGE_SHIFT; > array_size = (nr_pages * sizeof(struct page *)); >----- E-Mail ist virenfrei. Von AVG überprüft - www.avg.de Version: 2012.0.2180 / Virendatenbank: 2433/5069 - Ausgabedatum: 14.06.2012
OK, found the problem in the patch file, baking 3.4.2...BR, Carsten. -----Ursprüngliche Nachricht----- Von: xen-devel-bounces@lists.xen.org [mailto:xen-devel-bounces@lists.xen.org] Im Auftrag von Carsten Schiers Gesendet: Donnerstag, 14. Juni 2012 20:40 An: Konrad Rzeszutek Wilk Cc: Konrad Rzeszutek Wilk; xen-devel; Jan Beulich; Sander Eikelenboom Betreff: Re: [Xen-devel] Load increase after memory upgrade (part2) Konrad, against which kernel version did you produce this patch? It will not succeed with 3.4.2 at least, will look up some older version now... -----Ursprüngliche Nachricht----- Von: xen-devel-bounces@lists.xen.org [mailto:xen-devel-bounces@lists.xen.org] Im Auftrag von Konrad Rzeszutek Wilk Gesendet: Mittwoch, 13. Juni 2012 18:55 An: Carsten Schiers Cc: Konrad Rzeszutek Wilk; xen-devel; Jan Beulich; Sander Eikelenboom Betreff: Re: [Xen-devel] Load increase after memory upgrade (part2) On Fri, May 11, 2012 at 03:41:38PM -0400, Konrad Rzeszutek Wilk wrote:> On Fri, May 11, 2012 at 11:39:08AM +0200, Carsten Schiers wrote: > > Hi Konrad, > > > > > > don''t want to be pushy, as I have no real issue. I simply use the Xenified kernel or take the double load. > > > > But I think this mistery is still open. My last status was that the > > latest patch you produced resulted in a BUG, > > Yes, that is right. Thank you for reminding me. > > > > so we still have not checked whether our theory is correct. > > No we haven''t. And I should be have no trouble reproducing this. I can > just write a tiny module that allocates vmalloc_32().Done. Found some bugs.. and here is anew version. Can you please try it out? It has the #define DEBUG 1 set so it should print a lot of stuff when the DVB module loads. If it crashes please send me the full log. Thanks. From 5afb4ab1fb3d2b059fe1a6db93ab65cb76f43b8a Mon Sep 17 00:00:00 2001 From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Date: Thu, 31 May 2012 14:21:04 -0400 Subject: [PATCH] xen/vmalloc_32: Use xen_exchange_.. when GFP flags are DMA. [v3] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> --- arch/x86/xen/mmu.c | 187 +++++++++++++++++++++++++++++++++++++++++++++++- include/xen/xen-ops.h | 2 + mm/vmalloc.c | 18 +++++- 3 files changed, 202 insertions(+), 5 deletions(-) diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c index 3a73785..960d206 100644 --- a/arch/x86/xen/mmu.c +++ b/arch/x86/xen/mmu.c @@ -47,6 +47,7 @@ #include <linux/gfp.h> #include <linux/memblock.h> #include <linux/seq_file.h> +#include <linux/slab.h> #include <trace/events/xen.h> @@ -2051,6 +2052,7 @@ void __init xen_init_mmu_ops(void) /* Protected by xen_reservation_lock. */ #define MAX_CONTIG_ORDER 9 /* 2MB */ static unsigned long discontig_frames[1<<MAX_CONTIG_ORDER]; +static unsigned long limited_frames[1<<MAX_CONTIG_ORDER]; #define VOID_PTE (mfn_pte(0, __pgprot(0))) static void xen_zap_pfn_range(unsigned long vaddr, unsigned int order, @@ -2075,6 +2077,42 @@ static void xen_zap_pfn_range(unsigned long vaddr, unsigned int order, } xen_mc_issue(0); } +static int xen_zap_page_range(struct page *pages, unsigned int order, + unsigned long *in_frames, + unsigned long *out_frames, + void *limit_bitmap) +{ + int i, n = 0; + struct multicall_space mcs; + struct page *page; + + xen_mc_batch(); + for (i = 0; i < (1UL<<order); i++) { + if (!test_bit(i, limit_bitmap)) + continue; + + page = &pages[i]; + mcs = __xen_mc_entry(0); +#define DEBUG 1 + if (in_frames) { +#ifdef DEBUG + printk(KERN_INFO "%s:%d 0x%lx(pfn) 0x%lx (mfn) 0x%lx(vaddr)\n", + __func__, i, page_to_pfn(page), + pfn_to_mfn(page_to_pfn(page)), page_address(page)); #endif + in_frames[i] = pfn_to_mfn(page_to_pfn(page)); + } + MULTI_update_va_mapping(mcs.mc, (unsigned long)page_address(page), VOID_PTE, 0); + set_phys_to_machine(page_to_pfn(page), INVALID_P2M_ENTRY); + + if (out_frames) + out_frames[i] = page_to_pfn(page); + ++n; + + } + xen_mc_issue(0); + return n; +} /* * Update the pfn-to-mfn mappings for a virtual address range, either to @@ -2118,6 +2156,53 @@ static void xen_remap_exchanged_ptes(unsigned long vaddr, int order, xen_mc_issue(0); } +static void xen_remap_exchanged_pages(struct page *pages, int order, + unsigned long *mfns, + unsigned long first_mfn, /* in_frame if we failed*/ + void *limit_map) +{ + unsigned i, limit; + unsigned long mfn; + struct page *page; + + xen_mc_batch(); + + limit = 1ULL << order; + for (i = 0; i < limit; i++) { + struct multicall_space mcs; + unsigned flags; + + if (!test_bit(i, limit_map)) + continue; + + page = &pages[i]; + mcs = __xen_mc_entry(0); + if (mfns) + mfn = mfns[i]; + else + mfn = first_mfn + i; + + if (i < (limit - 1)) + flags = 0; + else { + if (order == 0) + flags = UVMF_INVLPG | UVMF_ALL; + else + flags = UVMF_TLB_FLUSH | UVMF_ALL; + } +#ifdef DEBUG + printk(KERN_INFO "%s (%d) pfn:0x%lx, pfn: 0x%lx vaddr: 0x%lx\n", + __func__, i, page_to_pfn(page), mfn, page_address(page)); #endif + MULTI_update_va_mapping(mcs.mc, (unsigned long)page_address(page), + mfn_pte(mfn, PAGE_KERNEL), flags); + + set_phys_to_machine(page_to_pfn(page), mfn); + } + + xen_mc_issue(0); +} + /* * Perform the hypercall to exchange a region of our pfns to point to @@ -2136,7 +2221,9 @@ static int xen_exchange_memory(unsigned long extents_in, unsigned int order_in, { long rc; int success; - +#ifdef DEBUG + int i; +#endif struct xen_memory_exchange exchange = { .in = { .nr_extents = extents_in, @@ -2157,7 +2244,11 @@ static int xen_exchange_memory(unsigned long extents_in, unsigned int order_in, rc = HYPERVISOR_memory_op(XENMEM_exchange, &exchange); success = (exchange.nr_exchanged == extents_in); - +#ifdef DEBUG + for (i = 0; i < exchange.nr_exchanged; i++) { + printk(KERN_INFO "%s 0x%lx (mfn) <-> 0x%lx (mfn)\n", __func__,pfns_in[i], mfns_out[i]); + } +#endif BUG_ON(!success && ((exchange.nr_exchanged != 0) || (rc == 0))); BUG_ON(success && (rc != 0)); @@ -2231,8 +2322,8 @@ void xen_destroy_contiguous_region(unsigned long vstart, unsigned int order) xen_zap_pfn_range(vstart, order, NULL, out_frames); /* 3. Do the exchange for non-contiguous MFNs. */ - success = xen_exchange_memory(1, order, &in_frame, 1UL << order, - 0, out_frames, 0); + success = xen_exchange_memory(1, order, &in_frame, + 1UL << order, 0, out_frames, 0); /* 4. Map new pages in place of old pages. */ if (success) @@ -2244,6 +2335,94 @@ void xen_destroy_contiguous_region(unsigned long vstart, unsigned int order) } EXPORT_SYMBOL_GPL(xen_destroy_contiguous_region); +int xen_limit_pages_to_max_mfn(struct page *pages, unsigned int order, + unsigned int address_bits) +{ + unsigned long *in_frames = discontig_frames, *out_frames = limited_frames; + unsigned long flags; + struct page *page; + int success; + int i, n = 0; + unsigned long _limit_map; + unsigned long *limit_map; + + if (xen_feature(XENFEAT_auto_translated_physmap)) + return 0; + + if (unlikely(order > MAX_CONTIG_ORDER)) + return -ENOMEM; + + if (BITS_PER_LONG >> order) { + limit_map = kzalloc(BITS_TO_LONGS(1U << order) * + sizeof(*limit_map), GFP_KERNEL); + if (unlikely(!limit_map)) + return -ENOMEM; + } else + limit_map = &_limit_map; + + /* 0. Construct our per page bitmap lookup. */ + + if (address_bits && (address_bits < PAGE_SHIFT)) + return -EINVAL; + + if (order) + bitmap_zero(limit_map, 1U << order); + else + __set_bit(0, limit_map); + + /* 1. Clear the pages */ + for (i = 0; i < (1ULL << order); i++) { + void *vaddr; + page = &pages[i]; + + vaddr = page_address(page); +#ifdef DEBUG + printk(KERN_INFO "%s: page: %p vaddr: %p 0x%lx(mfn) 0x%lx(pfn)\n", +__func__, page, vaddr, virt_to_mfn(vaddr), mfn_to_pfn(virt_to_mfn(vaddr))); #endif + if (address_bits) { + if (!(virt_to_mfn(vaddr) >> (address_bits - PAGE_SHIFT))) + continue; + __set_bit(i, limit_map); + } + if (!PageHighMem(page)) + memset(vaddr, 0, PAGE_SIZE); + else { + memset(kmap(page), 0, PAGE_SIZE); + kunmap(page); + ++n; + } + } + /* Check to see if we actually have to do any work. */ + if (bitmap_empty(limit_map, 1U << order)) { + if (limit_map != &_limit_map) + kfree(limit_map); + return 0; + } + if (n) + kmap_flush_unused(); + + spin_lock_irqsave(&xen_reservation_lock, flags); + + /* 2. Zap current PTEs. */ + n = xen_zap_page_range(pages, order, in_frames, NULL /*out_frames */, +limit_map); + + /* 3. Do the exchange for non-contiguous MFNs. */ + success = xen_exchange_memory(n, 0 /* this is always called per page */, in_frames, + n, 0, out_frames, address_bits); + + /* 4. Map new pages in place of old pages. */ + if (success) + xen_remap_exchanged_pages(pages, order, out_frames, 0, limit_map); + else + xen_remap_exchanged_pages(pages, order, NULL, *in_frames, limit_map); + + spin_unlock_irqrestore(&xen_reservation_lock, flags); + if (limit_map != &_limit_map) + kfree(limit_map); + + return success ? 0 : -ENOMEM; +} +EXPORT_SYMBOL_GPL(xen_limit_pages_to_max_mfn); #ifdef CONFIG_XEN_PVHVM static void xen_hvm_exit_mmap(struct mm_struct *mm) { diff --git a/include/xen/xen-ops.h b/include/xen/xen-ops.h index 6a198e4..2f8709f 100644 --- a/include/xen/xen-ops.h +++ b/include/xen/xen-ops.h @@ -29,4 +29,6 @@ int xen_remap_domain_mfn_range(struct vm_area_struct *vma, unsigned long mfn, int nr, pgprot_t prot, unsigned domid); +int xen_limit_pages_to_max_mfn(struct page *pages, unsigned int order, + unsigned int address_bits); #endif /* INCLUDE_XEN_OPS_H */ diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 2aad499..194af07 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -31,6 +31,8 @@ #include <asm/tlbflush.h> #include <asm/shmparam.h> +#include <xen/xen.h> +#include <xen/xen-ops.h> /*** Page table manipulation functions ***/ static void vunmap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end) @@ -1576,7 +1578,11 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask, struct page **pages; unsigned int nr_pages, array_size, i; gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO; - + gfp_t dma_mask = gfp_mask & (__GFP_DMA | __GFP_DMA32); + if (xen_pv_domain()) { + if (dma_mask == (__GFP_DMA | __GFP_DMA32)) + gfp_mask &= ~(__GFP_DMA | __GFP_DMA32); + } nr_pages = (area->size - PAGE_SIZE) >> PAGE_SHIFT; array_size = (nr_pages * sizeof(struct page *)); @@ -1612,6 +1618,16 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask, goto fail; } area->pages[i] = page; + if (xen_pv_domain()) { + if (dma_mask) { + if (xen_limit_pages_to_max_mfn(page, 0, 32)) { + area->nr_pages = i + 1; + goto fail; + } + if (gfp_mask & __GFP_ZERO) + clear_highpage(page); + } + } } if (map_vm_area(area, prot, &pages)) -- 1.7.7.6 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ----- E-Mail ist virenfrei. Von AVG überprüft - www.avg.de Version: 2012.0.2180 / Virendatenbank: 2433/5067 - Ausgabedatum: 13.06.2012 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ----- E-Mail ist virenfrei. Von AVG überprüft - www.avg.de Version: 2012.0.2180 / Virendatenbank: 2433/5069 - Ausgabedatum: 14.06.2012