Dongxiao Xu
2012-Dec-06 13:08 UTC
[PATCH] xen/swiotlb: Exchange to contiguous memory for map_sg hook
While mapping sg buffers, checking to cross page DMA buffer is also needed. If the guest DMA buffer crosses page boundary, Xen should exchange contiguous memory for it. Besides, it is needed to backup the original page contents and copy it back after memory exchange is done. This fixes issues if device DMA into software static buffers, and in case the static buffer cross page boundary which pages are not contiguous in real hardware. Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com> Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com> --- drivers/xen/swiotlb-xen.c | 47 ++++++++++++++++++++++++++++++++++++++++++++- 1 files changed, 46 insertions(+), 1 deletions(-) diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c index 58db6df..e8f0cfb 100644 --- a/drivers/xen/swiotlb-xen.c +++ b/drivers/xen/swiotlb-xen.c @@ -461,6 +461,22 @@ xen_swiotlb_sync_single_for_device(struct device *hwdev, dma_addr_t dev_addr, } EXPORT_SYMBOL_GPL(xen_swiotlb_sync_single_for_device); +static bool +check_continguous_region(unsigned long vstart, unsigned long order) +{ + unsigned long prev_ma = xen_virt_to_bus((void *)vstart); + unsigned long next_ma; + int i; + + for (i = 1; i < (1 << order); i++) { + next_ma = xen_virt_to_bus((void *)(vstart + i * PAGE_SIZE)); + if (next_ma != prev_ma + PAGE_SIZE) + return false; + prev_ma = next_ma; + } + return true; +} + /* * Map a set of buffers described by scatterlist in streaming mode for DMA. * This is the scatter-gather version of the above xen_swiotlb_map_page @@ -489,7 +505,36 @@ xen_swiotlb_map_sg_attrs(struct device *hwdev, struct scatterlist *sgl, for_each_sg(sgl, sg, nelems, i) { phys_addr_t paddr = sg_phys(sg); - dma_addr_t dev_addr = xen_phys_to_bus(paddr); + unsigned long vstart, order; + dma_addr_t dev_addr; + + /* + * While mapping sg buffers, checking to cross page DMA buffer + * is also needed. If the guest DMA buffer crosses page + * boundary, Xen should exchange contiguous memory for it. + * Besides, it is needed to backup the original page contents + * and copy it back after memory exchange is done. + */ + if (range_straddles_page_boundary(paddr, sg->length)) { + vstart = (unsigned long)__va(paddr & PAGE_MASK); + order = get_order(sg->length + (paddr & ~PAGE_MASK)); + if (!check_continguous_region(vstart, order)) { + unsigned long buf; + buf = __get_free_pages(GFP_KERNEL, order); + memcpy((void *)buf, (void *)vstart, + PAGE_SIZE * (1 << order)); + if (xen_create_contiguous_region(vstart, order, + fls64(paddr))) { + free_pages(buf, order); + return 0; + } + memcpy((void *)vstart, (void *)buf, + PAGE_SIZE * (1 << order)); + free_pages(buf, order); + } + } + + dev_addr = xen_phys_to_bus(paddr); if (swiotlb_force || !dma_capable(hwdev, dev_addr, sg->length) || -- 1.7.1
Jan Beulich
2012-Dec-06 13:37 UTC
Re: [PATCH] xen/swiotlb: Exchange to contiguous memory for map_sg hook
>>> On 06.12.12 at 14:08, Dongxiao Xu <dongxiao.xu@intel.com> wrote: > While mapping sg buffers, checking to cross page DMA buffer is > also needed. If the guest DMA buffer crosses page boundary, Xen > should exchange contiguous memory for it. > > Besides, it is needed to backup the original page contents > and copy it back after memory exchange is done. > > This fixes issues if device DMA into software static buffers, > and in case the static buffer cross page boundary which pages are > not contiguous in real hardware. > > Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com> > Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com> > --- > drivers/xen/swiotlb-xen.c | 47 > ++++++++++++++++++++++++++++++++++++++++++++- > 1 files changed, 46 insertions(+), 1 deletions(-) > > diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c > index 58db6df..e8f0cfb 100644 > --- a/drivers/xen/swiotlb-xen.c > +++ b/drivers/xen/swiotlb-xen.c > @@ -461,6 +461,22 @@ xen_swiotlb_sync_single_for_device(struct device *hwdev, > dma_addr_t dev_addr, > } > EXPORT_SYMBOL_GPL(xen_swiotlb_sync_single_for_device); > > +static bool > +check_continguous_region(unsigned long vstart, unsigned long order)check_continguous_region(unsigned long vstart, unsigned int order) But - why do you need to do this check order based in the first place? Checking the actual length of the buffer should suffice.> +{ > + unsigned long prev_ma = xen_virt_to_bus((void *)vstart); > + unsigned long next_ma;phys_addr_t or some such for both of them.> + int i;unsigned long> + > + for (i = 1; i < (1 << order); i++) {1UL> + next_ma = xen_virt_to_bus((void *)(vstart + i * PAGE_SIZE)); > + if (next_ma != prev_ma + PAGE_SIZE) > + return false; > + prev_ma = next_ma; > + } > + return true; > +} > + > /* > * Map a set of buffers described by scatterlist in streaming mode for DMA. > * This is the scatter-gather version of the above xen_swiotlb_map_page > @@ -489,7 +505,36 @@ xen_swiotlb_map_sg_attrs(struct device *hwdev, struct > scatterlist *sgl, > > for_each_sg(sgl, sg, nelems, i) { > phys_addr_t paddr = sg_phys(sg); > - dma_addr_t dev_addr = xen_phys_to_bus(paddr); > + unsigned long vstart, order; > + dma_addr_t dev_addr; > + > + /* > + * While mapping sg buffers, checking to cross page DMA buffer > + * is also needed. If the guest DMA buffer crosses page > + * boundary, Xen should exchange contiguous memory for it. > + * Besides, it is needed to backup the original page contents > + * and copy it back after memory exchange is done. > + */ > + if (range_straddles_page_boundary(paddr, sg->length)) { > + vstart = (unsigned long)__va(paddr & PAGE_MASK); > + order = get_order(sg->length + (paddr & ~PAGE_MASK)); > + if (!check_continguous_region(vstart, order)) { > + unsigned long buf; > + buf = __get_free_pages(GFP_KERNEL, order); > + memcpy((void *)buf, (void *)vstart, > + PAGE_SIZE * (1 << order)); > + if (xen_create_contiguous_region(vstart, order, > + fls64(paddr))) { > + free_pages(buf, order); > + return 0; > + } > + memcpy((void *)vstart, (void *)buf, > + PAGE_SIZE * (1 << order)); > + free_pages(buf, order); > + } > + } > + > + dev_addr = xen_phys_to_bus(paddr); > > if (swiotlb_force || > !dma_capable(hwdev, dev_addr, sg->length) ||How about swiotlb_map_page() (for the compound page case)? Jan
Konrad Rzeszutek Wilk
2012-Dec-07 14:08 UTC
Re: [PATCH] xen/swiotlb: Exchange to contiguous memory for map_sg hook
On Thu, Dec 06, 2012 at 09:08:42PM +0800, Dongxiao Xu wrote:> While mapping sg buffers, checking to cross page DMA buffer is > also needed. If the guest DMA buffer crosses page boundary, Xen > should exchange contiguous memory for it.So this is when we cross those 2MB contingous swatch of buffers. Wouldn''t we get the same problem with the ''map_page'' call? If the driver tried to map say a 4MB DMA region? What if this check was done in the routines that provide the software static buffers and there try to provide a nice DMA contingous swatch of pages?> > Besides, it is needed to backup the original page contents > and copy it back after memory exchange is done. > > This fixes issues if device DMA into software static buffers, > and in case the static buffer cross page boundary which pages are > not contiguous in real hardware. > > Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com> > Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com> > --- > drivers/xen/swiotlb-xen.c | 47 ++++++++++++++++++++++++++++++++++++++++++++- > 1 files changed, 46 insertions(+), 1 deletions(-) > > diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c > index 58db6df..e8f0cfb 100644 > --- a/drivers/xen/swiotlb-xen.c > +++ b/drivers/xen/swiotlb-xen.c > @@ -461,6 +461,22 @@ xen_swiotlb_sync_single_for_device(struct device *hwdev, dma_addr_t dev_addr, > } > EXPORT_SYMBOL_GPL(xen_swiotlb_sync_single_for_device); > > +static bool > +check_continguous_region(unsigned long vstart, unsigned long order) > +{ > + unsigned long prev_ma = xen_virt_to_bus((void *)vstart); > + unsigned long next_ma; > + int i; > + > + for (i = 1; i < (1 << order); i++) { > + next_ma = xen_virt_to_bus((void *)(vstart + i * PAGE_SIZE)); > + if (next_ma != prev_ma + PAGE_SIZE) > + return false; > + prev_ma = next_ma; > + } > + return true; > +} > + > /* > * Map a set of buffers described by scatterlist in streaming mode for DMA. > * This is the scatter-gather version of the above xen_swiotlb_map_page > @@ -489,7 +505,36 @@ xen_swiotlb_map_sg_attrs(struct device *hwdev, struct scatterlist *sgl, > > for_each_sg(sgl, sg, nelems, i) { > phys_addr_t paddr = sg_phys(sg); > - dma_addr_t dev_addr = xen_phys_to_bus(paddr); > + unsigned long vstart, order; > + dma_addr_t dev_addr; > + > + /* > + * While mapping sg buffers, checking to cross page DMA buffer > + * is also needed. If the guest DMA buffer crosses page > + * boundary, Xen should exchange contiguous memory for it. > + * Besides, it is needed to backup the original page contents > + * and copy it back after memory exchange is done. > + */ > + if (range_straddles_page_boundary(paddr, sg->length)) { > + vstart = (unsigned long)__va(paddr & PAGE_MASK); > + order = get_order(sg->length + (paddr & ~PAGE_MASK)); > + if (!check_continguous_region(vstart, order)) { > + unsigned long buf; > + buf = __get_free_pages(GFP_KERNEL, order); > + memcpy((void *)buf, (void *)vstart, > + PAGE_SIZE * (1 << order)); > + if (xen_create_contiguous_region(vstart, order, > + fls64(paddr))) { > + free_pages(buf, order); > + return 0; > + } > + memcpy((void *)vstart, (void *)buf, > + PAGE_SIZE * (1 << order)); > + free_pages(buf, order); > + } > + } > + > + dev_addr = xen_phys_to_bus(paddr); > > if (swiotlb_force || > !dma_capable(hwdev, dev_addr, sg->length) || > -- > 1.7.1 >
Konrad Rzeszutek Wilk
2012-Dec-07 14:11 UTC
Re: [PATCH] xen/swiotlb: Exchange to contiguous memory for map_sg hook
On Thu, Dec 06, 2012 at 01:37:41PM +0000, Jan Beulich wrote:> >>> On 06.12.12 at 14:08, Dongxiao Xu <dongxiao.xu@intel.com> wrote: > > While mapping sg buffers, checking to cross page DMA buffer is > > also needed. If the guest DMA buffer crosses page boundary, Xen > > should exchange contiguous memory for it. > > > > Besides, it is needed to backup the original page contents > > and copy it back after memory exchange is done. > > > > This fixes issues if device DMA into software static buffers, > > and in case the static buffer cross page boundary which pages are > > not contiguous in real hardware. > > > > Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com> > > Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com> > > --- > > drivers/xen/swiotlb-xen.c | 47 > > ++++++++++++++++++++++++++++++++++++++++++++- > > 1 files changed, 46 insertions(+), 1 deletions(-) > > > > diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c > > index 58db6df..e8f0cfb 100644 > > --- a/drivers/xen/swiotlb-xen.c > > +++ b/drivers/xen/swiotlb-xen.c > > @@ -461,6 +461,22 @@ xen_swiotlb_sync_single_for_device(struct device *hwdev, > > dma_addr_t dev_addr, > > } > > EXPORT_SYMBOL_GPL(xen_swiotlb_sync_single_for_device); > > > > +static bool > > +check_continguous_region(unsigned long vstart, unsigned long order) > > check_continguous_region(unsigned long vstart, unsigned int order) > > But - why do you need to do this check order based in the first > place? Checking the actual length of the buffer should suffice. > > > +{ > > + unsigned long prev_ma = xen_virt_to_bus((void *)vstart); > > + unsigned long next_ma; > > phys_addr_t or some such for both of them. > > > + int i; > > unsigned long > > > + > > + for (i = 1; i < (1 << order); i++) { > > 1UL > > > + next_ma = xen_virt_to_bus((void *)(vstart + i * PAGE_SIZE)); > > + if (next_ma != prev_ma + PAGE_SIZE) > > + return false; > > + prev_ma = next_ma; > > + } > > + return true; > > +} > > + > > /* > > * Map a set of buffers described by scatterlist in streaming mode for DMA. > > * This is the scatter-gather version of the above xen_swiotlb_map_page > > @@ -489,7 +505,36 @@ xen_swiotlb_map_sg_attrs(struct device *hwdev, struct > > scatterlist *sgl, > > > > for_each_sg(sgl, sg, nelems, i) { > > phys_addr_t paddr = sg_phys(sg); > > - dma_addr_t dev_addr = xen_phys_to_bus(paddr); > > + unsigned long vstart, order; > > + dma_addr_t dev_addr; > > + > > + /* > > + * While mapping sg buffers, checking to cross page DMA buffer > > + * is also needed. If the guest DMA buffer crosses page > > + * boundary, Xen should exchange contiguous memory for it. > > + * Besides, it is needed to backup the original page contents > > + * and copy it back after memory exchange is done. > > + */ > > + if (range_straddles_page_boundary(paddr, sg->length)) { > > + vstart = (unsigned long)__va(paddr & PAGE_MASK); > > + order = get_order(sg->length + (paddr & ~PAGE_MASK)); > > + if (!check_continguous_region(vstart, order)) { > > + unsigned long buf; > > + buf = __get_free_pages(GFP_KERNEL, order); > > + memcpy((void *)buf, (void *)vstart, > > + PAGE_SIZE * (1 << order)); > > + if (xen_create_contiguous_region(vstart, order, > > + fls64(paddr))) { > > + free_pages(buf, order); > > + return 0; > > + } > > + memcpy((void *)vstart, (void *)buf, > > + PAGE_SIZE * (1 << order)); > > + free_pages(buf, order); > > + } > > + } > > + > > + dev_addr = xen_phys_to_bus(paddr); > > > > if (swiotlb_force || > > !dma_capable(hwdev, dev_addr, sg->length) || > > How about swiotlb_map_page() (for the compound page case)?Heh. Thanks - I just got to your reply now and had the same question. Interestingly enough - this looks like a problem that has been forever and nobody ever hit this. Worst, the problem is even present if a driver uses the pci_alloc_coherent and asks for a 3MB region or such - as we can at most give out only 2MB swaths.> > Jan >
Xu, Dongxiao
2012-Dec-11 06:27 UTC
Re: [PATCH] xen/swiotlb: Exchange to contiguous memory for map_sg hook
> -----Original Message----- > From: Jan Beulich [mailto:JBeulich@suse.com] > Sent: Thursday, December 06, 2012 9:38 PM > To: Xu, Dongxiao > Cc: xen-devel@lists.xen.org; konrad.wilk@oracle.com; > linux-kernel@vger.kernel.org > Subject: Re: [Xen-devel] [PATCH] xen/swiotlb: Exchange to contiguous memory > for map_sg hook > > >>> On 06.12.12 at 14:08, Dongxiao Xu <dongxiao.xu@intel.com> wrote: > > While mapping sg buffers, checking to cross page DMA buffer is also > > needed. If the guest DMA buffer crosses page boundary, Xen should > > exchange contiguous memory for it. > > > > Besides, it is needed to backup the original page contents and copy it > > back after memory exchange is done. > > > > This fixes issues if device DMA into software static buffers, and in > > case the static buffer cross page boundary which pages are not > > contiguous in real hardware. > > > > Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com> > > Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com> > > --- > > drivers/xen/swiotlb-xen.c | 47 > > ++++++++++++++++++++++++++++++++++++++++++++- > > 1 files changed, 46 insertions(+), 1 deletions(-) > > > > diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c > > index 58db6df..e8f0cfb 100644 > > --- a/drivers/xen/swiotlb-xen.c > > +++ b/drivers/xen/swiotlb-xen.c > > @@ -461,6 +461,22 @@ xen_swiotlb_sync_single_for_device(struct device > > *hwdev, dma_addr_t dev_addr, } > > EXPORT_SYMBOL_GPL(xen_swiotlb_sync_single_for_device); > > > > +static bool > > +check_continguous_region(unsigned long vstart, unsigned long order) > > check_continguous_region(unsigned long vstart, unsigned int order) > > But - why do you need to do this check order based in the first place? Checking > the actual length of the buffer should suffice.Thanks, the word "continguous" is mistyped in the function, it should be "contiguous". check_contiguous_region() function is used to check whether pages are contiguous in hardware. The length only indicates whether the buffer crosses page boundary. If buffer crosses pages and they are not contiguous in hardware, we do need to exchange memory in Xen.> > > +{ > > + unsigned long prev_ma = xen_virt_to_bus((void *)vstart); > > + unsigned long next_ma; > > phys_addr_t or some such for both of them.Thanks. Should be dma_addr_t?> > > + int i; > > unsigned longThanks.> > > + > > + for (i = 1; i < (1 << order); i++) { > > 1ULThanks.> > > + next_ma = xen_virt_to_bus((void *)(vstart + i * PAGE_SIZE)); > > + if (next_ma != prev_ma + PAGE_SIZE) > > + return false; > > + prev_ma = next_ma; > > + } > > + return true; > > +} > > + > > /* > > * Map a set of buffers described by scatterlist in streaming mode for > DMA. > > * This is the scatter-gather version of the above > > xen_swiotlb_map_page @@ -489,7 +505,36 @@ > > xen_swiotlb_map_sg_attrs(struct device *hwdev, struct scatterlist > > *sgl, > > > > for_each_sg(sgl, sg, nelems, i) { > > phys_addr_t paddr = sg_phys(sg); > > - dma_addr_t dev_addr = xen_phys_to_bus(paddr); > > + unsigned long vstart, order; > > + dma_addr_t dev_addr; > > + > > + /* > > + * While mapping sg buffers, checking to cross page DMA buffer > > + * is also needed. If the guest DMA buffer crosses page > > + * boundary, Xen should exchange contiguous memory for it. > > + * Besides, it is needed to backup the original page contents > > + * and copy it back after memory exchange is done. > > + */ > > + if (range_straddles_page_boundary(paddr, sg->length)) { > > + vstart = (unsigned long)__va(paddr & PAGE_MASK); > > + order = get_order(sg->length + (paddr & ~PAGE_MASK)); > > + if (!check_continguous_region(vstart, order)) { > > + unsigned long buf; > > + buf = __get_free_pages(GFP_KERNEL, order); > > + memcpy((void *)buf, (void *)vstart, > > + PAGE_SIZE * (1 << order)); > > + if (xen_create_contiguous_region(vstart, order, > > + fls64(paddr))) { > > + free_pages(buf, order); > > + return 0; > > + } > > + memcpy((void *)vstart, (void *)buf, > > + PAGE_SIZE * (1 << order)); > > + free_pages(buf, order); > > + } > > + } > > + > > + dev_addr = xen_phys_to_bus(paddr); > > > > if (swiotlb_force || > > !dma_capable(hwdev, dev_addr, sg->length) || > > How about swiotlb_map_page() (for the compound page case)?Yes! This should also need similar handling. One thing needs further consideration is that, the above approach introduces two memory copies, which has race condition that, when we are exchanging/copying pages, dom0 may visit other elements right in the pages. One choice is to move the memory copy in hypervisor, which requires us to modify the XENMEM_exchange hypercall and add certain flags indicating whether the exchange needs memory copying. Or another choice to solve this issue in driver side to avoid DMA into such static buffers? This is easy to modify one driver but may have difficulties to monitor so many device drivers. Thanks, Dongxiao> > Jan_______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Xu, Dongxiao
2012-Dec-11 06:39 UTC
Re: [PATCH] xen/swiotlb: Exchange to contiguous memory for map_sg hook
> -----Original Message----- > From: Konrad Rzeszutek Wilk [mailto:konrad.wilk@oracle.com] > Sent: Friday, December 07, 2012 10:09 PM > To: Xu, Dongxiao > Cc: xen-devel@lists.xen.org; linux-kernel@vger.kernel.org > Subject: Re: [PATCH] xen/swiotlb: Exchange to contiguous memory for map_sg > hook > > On Thu, Dec 06, 2012 at 09:08:42PM +0800, Dongxiao Xu wrote: > > While mapping sg buffers, checking to cross page DMA buffer is also > > needed. If the guest DMA buffer crosses page boundary, Xen should > > exchange contiguous memory for it. > > So this is when we cross those 2MB contingous swatch of buffers. > Wouldn''t we get the same problem with the ''map_page'' call? If the driver tried > to map say a 4MB DMA region?Yes, it also needs such check, as I just replied to Jan''s mail.> > What if this check was done in the routines that provide the software static > buffers and there try to provide a nice DMA contingous swatch of pages?Yes, this approach also came to our mind, which needs to modify the driver itself. If so, it requires driver not using such static buffers (e.g., from kmalloc) to do DMA even if the buffer is continuous in native. Is this acceptable by kernel/driver upstream? Thanks, Dongxiao> > > > > Besides, it is needed to backup the original page contents and copy it > > back after memory exchange is done. > > > > This fixes issues if device DMA into software static buffers, and in > > case the static buffer cross page boundary which pages are not > > contiguous in real hardware. > > > > Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com> > > Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com> > > --- > > drivers/xen/swiotlb-xen.c | 47 > ++++++++++++++++++++++++++++++++++++++++++++- > > 1 files changed, 46 insertions(+), 1 deletions(-) > > > > diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c > > index 58db6df..e8f0cfb 100644 > > --- a/drivers/xen/swiotlb-xen.c > > +++ b/drivers/xen/swiotlb-xen.c > > @@ -461,6 +461,22 @@ xen_swiotlb_sync_single_for_device(struct device > > *hwdev, dma_addr_t dev_addr, } > > EXPORT_SYMBOL_GPL(xen_swiotlb_sync_single_for_device); > > > > +static bool > > +check_continguous_region(unsigned long vstart, unsigned long order) { > > + unsigned long prev_ma = xen_virt_to_bus((void *)vstart); > > + unsigned long next_ma; > > + int i; > > + > > + for (i = 1; i < (1 << order); i++) { > > + next_ma = xen_virt_to_bus((void *)(vstart + i * PAGE_SIZE)); > > + if (next_ma != prev_ma + PAGE_SIZE) > > + return false; > > + prev_ma = next_ma; > > + } > > + return true; > > +} > > + > > /* > > * Map a set of buffers described by scatterlist in streaming mode for > DMA. > > * This is the scatter-gather version of the above > > xen_swiotlb_map_page @@ -489,7 +505,36 @@ > > xen_swiotlb_map_sg_attrs(struct device *hwdev, struct scatterlist > > *sgl, > > > > for_each_sg(sgl, sg, nelems, i) { > > phys_addr_t paddr = sg_phys(sg); > > - dma_addr_t dev_addr = xen_phys_to_bus(paddr); > > + unsigned long vstart, order; > > + dma_addr_t dev_addr; > > + > > + /* > > + * While mapping sg buffers, checking to cross page DMA buffer > > + * is also needed. If the guest DMA buffer crosses page > > + * boundary, Xen should exchange contiguous memory for it. > > + * Besides, it is needed to backup the original page contents > > + * and copy it back after memory exchange is done. > > + */ > > + if (range_straddles_page_boundary(paddr, sg->length)) { > > + vstart = (unsigned long)__va(paddr & PAGE_MASK); > > + order = get_order(sg->length + (paddr & ~PAGE_MASK)); > > + if (!check_continguous_region(vstart, order)) { > > + unsigned long buf; > > + buf = __get_free_pages(GFP_KERNEL, order); > > + memcpy((void *)buf, (void *)vstart, > > + PAGE_SIZE * (1 << order)); > > + if (xen_create_contiguous_region(vstart, order, > > + fls64(paddr))) { > > + free_pages(buf, order); > > + return 0; > > + } > > + memcpy((void *)vstart, (void *)buf, > > + PAGE_SIZE * (1 << order)); > > + free_pages(buf, order); > > + } > > + } > > + > > + dev_addr = xen_phys_to_bus(paddr); > > > > if (swiotlb_force || > > !dma_capable(hwdev, dev_addr, sg->length) || > > -- > > 1.7.1 > >
Konrad Rzeszutek Wilk
2012-Dec-11 17:06 UTC
Re: [PATCH] xen/swiotlb: Exchange to contiguous memory for map_sg hook
On Tue, Dec 11, 2012 at 06:39:35AM +0000, Xu, Dongxiao wrote:> > -----Original Message----- > > From: Konrad Rzeszutek Wilk [mailto:konrad.wilk@oracle.com] > > Sent: Friday, December 07, 2012 10:09 PM > > To: Xu, Dongxiao > > Cc: xen-devel@lists.xen.org; linux-kernel@vger.kernel.org > > Subject: Re: [PATCH] xen/swiotlb: Exchange to contiguous memory for map_sg > > hook > > > > On Thu, Dec 06, 2012 at 09:08:42PM +0800, Dongxiao Xu wrote: > > > While mapping sg buffers, checking to cross page DMA buffer is also > > > needed. If the guest DMA buffer crosses page boundary, Xen should > > > exchange contiguous memory for it. > > > > So this is when we cross those 2MB contingous swatch of buffers. > > Wouldn''t we get the same problem with the ''map_page'' call? If the driver tried > > to map say a 4MB DMA region? > > Yes, it also needs such check, as I just replied to Jan''s mail. > > > > > What if this check was done in the routines that provide the software static > > buffers and there try to provide a nice DMA contingous swatch of pages? > > Yes, this approach also came to our mind, which needs to modify the driver itself. > If so, it requires driver not using such static buffers (e.g., from kmalloc) to do DMA even if the buffer is continuous in native.I am bit loss here. Is the issue you found only with drivers that do not use DMA API? Can you perhaps point me to the code that triggered this fix in the first place?> Is this acceptable by kernel/driver upstream?I am still not completely clear on what you had in mind. The one method I thought about that might help in this is to have Xen-SWIOTLB track which memory ranges were exchanged (so xen_swiotlb_fixup would save the *buf and the size for each call to xen_create_contiguous_region in a list or array). When xen_swiotlb_map/xen_swiotlb_map_sg_attrs care called they would consult said array/list to see if the region they retrieved crosses said 2MB chunks. If so.. and here I am unsure of what would be the best way to proceed.
Xu, Dongxiao
2012-Dec-12 01:03 UTC
Re: [PATCH] xen/swiotlb: Exchange to contiguous memory for map_sg hook
> -----Original Message----- > From: Konrad Rzeszutek Wilk [mailto:konrad.wilk@oracle.com] > Sent: Wednesday, December 12, 2012 1:07 AM > To: Xu, Dongxiao > Cc: xen-devel@lists.xen.org; linux-kernel@vger.kernel.org > Subject: Re: [PATCH] xen/swiotlb: Exchange to contiguous memory for map_sg > hook > > On Tue, Dec 11, 2012 at 06:39:35AM +0000, Xu, Dongxiao wrote: > > > -----Original Message----- > > > From: Konrad Rzeszutek Wilk [mailto:konrad.wilk@oracle.com] > > > Sent: Friday, December 07, 2012 10:09 PM > > > To: Xu, Dongxiao > > > Cc: xen-devel@lists.xen.org; linux-kernel@vger.kernel.org > > > Subject: Re: [PATCH] xen/swiotlb: Exchange to contiguous memory for > > > map_sg hook > > > > > > On Thu, Dec 06, 2012 at 09:08:42PM +0800, Dongxiao Xu wrote: > > > > While mapping sg buffers, checking to cross page DMA buffer is > > > > also needed. If the guest DMA buffer crosses page boundary, Xen > > > > should exchange contiguous memory for it. > > > > > > So this is when we cross those 2MB contingous swatch of buffers. > > > Wouldn''t we get the same problem with the ''map_page'' call? If the > > > driver tried to map say a 4MB DMA region? > > > > Yes, it also needs such check, as I just replied to Jan''s mail. > > > > > > > > What if this check was done in the routines that provide the > > > software static buffers and there try to provide a nice DMA contingous > swatch of pages? > > > > Yes, this approach also came to our mind, which needs to modify the driver > itself. > > If so, it requires driver not using such static buffers (e.g., from kmalloc) to do > DMA even if the buffer is continuous in native. > > I am bit loss here. > > Is the issue you found only with drivers that do not use DMA API? > Can you perhaps point me to the code that triggered this fix in the first place?Yes, we met this issue on a specific SAS device/driver, and it calls into libata-core code, you can refer to function ata_dev_read_id() called from ata_dev_reread_id() in drivers/ata/libata-core.c. In the above function, the target buffer is (void *)dev->link->ap->sector_buf, which is 512 bytes static buffer and unfortunately it across the page boundary.> > Is this acceptable by kernel/driver upstream? > > I am still not completely clear on what you had in mind. The one method I > thought about that might help in this is to have Xen-SWIOTLB track which > memory ranges were exchanged (so xen_swiotlb_fixup would save the *buf > and the size for each call to xen_create_contiguous_region in a list or array). > > When xen_swiotlb_map/xen_swiotlb_map_sg_attrs care called they would > consult said array/list to see if the region they retrieved crosses said 2MB > chunks. If so.. and here I am unsure of what would be the best way to proceed.We thought we can solve the issue in several ways: 1) Like the previous patch I sent out, we check the DMA region in xen_swiotlb_map_page() and xen_swiotlb_map_sg_attr(), and if DMA region crosses page boundary, we exchange the memory and copy the content. However it has race condition that when copying the memory content (we introduced two memory copies in the patch), some other code may also visit the page, which may encounter incorrect values. 2) Mostly the same as item 1), the only difference is that we put the memory content copy inside Xen hypervisor but not in Dom0. This requires we add certain flags to indicating memory moving in the XENMEM_exchange hypercall. 3) As you also mentioned, this is not a common case, it is only triggered by some specific devices/drivers. we can fix it in certain driver to avoid DMA into static buffers. Like (void *)dev->link->ap->sector_buf in the above case. But I am not sure whether it is acceptable by kernel/driver upstream. Thanks, Dongxiao
Jan Beulich
2012-Dec-12 09:38 UTC
Re: [PATCH] xen/swiotlb: Exchange to contiguous memory for map_sg hook
>>> On 12.12.12 at 02:03, "Xu, Dongxiao" <dongxiao.xu@intel.com> wrote: >> From: Konrad Rzeszutek Wilk [mailto:konrad.wilk@oracle.com] >> On Tue, Dec 11, 2012 at 06:39:35AM +0000, Xu, Dongxiao wrote: >> > > From: Konrad Rzeszutek Wilk [mailto:konrad.wilk@oracle.com] >> > > What if this check was done in the routines that provide the >> > > software static buffers and there try to provide a nice DMA contingous >> swatch of pages? >> > >> > Yes, this approach also came to our mind, which needs to modify the driver >> itself. >> > If so, it requires driver not using such static buffers (e.g., from > kmalloc) to do >> DMA even if the buffer is continuous in native. >> >> I am bit loss here. >> >> Is the issue you found only with drivers that do not use DMA API? >> Can you perhaps point me to the code that triggered this fix in the first > place? > > Yes, we met this issue on a specific SAS device/driver, and it calls into > libata-core code, you can refer to function ata_dev_read_id() called from > ata_dev_reread_id() in drivers/ata/libata-core.c. > > In the above function, the target buffer is (void *)dev->link->ap->sector_buf, > which is 512 bytes static buffer and unfortunately it across the page > boundary.I wonder whether such use of sg_init_one()/sg_set_buf() is correct in the first place. While there aren''t any restrictions documented for its use, one clearly can''t pass in whatever one wants (a pointer into vmalloc()-ed memory, for instance, won''t work afaict). I didn''t go through all other users of it, but quite a few of the uses elsewhere look similarly questionable.>> I am still not completely clear on what you had in mind. The one method I >> thought about that might help in this is to have Xen-SWIOTLB track which >> memory ranges were exchanged (so xen_swiotlb_fixup would save the *buf >> and the size for each call to xen_create_contiguous_region in a list or > array). >> >> When xen_swiotlb_map/xen_swiotlb_map_sg_attrs care called they would >> consult said array/list to see if the region they retrieved crosses said 2MB >> chunks. If so.. and here I am unsure of what would be the best way to > proceed. > > We thought we can solve the issue in several ways: > > 1) Like the previous patch I sent out, we check the DMA region in > xen_swiotlb_map_page() and xen_swiotlb_map_sg_attr(), and if DMA region > crosses page boundary, we exchange the memory and copy the content. However > it has race condition that when copying the memory content (we introduced two > memory copies in the patch), some other code may also visit the page, which > may encounter incorrect values.That''s why, after mapping a buffer (or SG list) one has to call the sync functions before looking at data. Any race as described by you is therefore a programming error. Jan
Konrad Rzeszutek Wilk
2012-Dec-13 16:34 UTC
Re: [PATCH] xen/swiotlb: Exchange to contiguous memory for map_sg hook
On Wed, Dec 12, 2012 at 01:03:55AM +0000, Xu, Dongxiao wrote:> > -----Original Message----- > > From: Konrad Rzeszutek Wilk [mailto:konrad.wilk@oracle.com] > > Sent: Wednesday, December 12, 2012 1:07 AM > > To: Xu, Dongxiao > > Cc: xen-devel@lists.xen.org; linux-kernel@vger.kernel.org > > Subject: Re: [PATCH] xen/swiotlb: Exchange to contiguous memory for map_sg > > hook > > > > On Tue, Dec 11, 2012 at 06:39:35AM +0000, Xu, Dongxiao wrote: > > > > -----Original Message----- > > > > From: Konrad Rzeszutek Wilk [mailto:konrad.wilk@oracle.com] > > > > Sent: Friday, December 07, 2012 10:09 PM > > > > To: Xu, Dongxiao > > > > Cc: xen-devel@lists.xen.org; linux-kernel@vger.kernel.org > > > > Subject: Re: [PATCH] xen/swiotlb: Exchange to contiguous memory for > > > > map_sg hook > > > > > > > > On Thu, Dec 06, 2012 at 09:08:42PM +0800, Dongxiao Xu wrote: > > > > > While mapping sg buffers, checking to cross page DMA buffer is > > > > > also needed. If the guest DMA buffer crosses page boundary, Xen > > > > > should exchange contiguous memory for it. > > > > > > > > So this is when we cross those 2MB contingous swatch of buffers. > > > > Wouldn''t we get the same problem with the ''map_page'' call? If the > > > > driver tried to map say a 4MB DMA region? > > > > > > Yes, it also needs such check, as I just replied to Jan''s mail. > > > > > > > > > > > What if this check was done in the routines that provide the > > > > software static buffers and there try to provide a nice DMA contingous > > swatch of pages? > > > > > > Yes, this approach also came to our mind, which needs to modify the driver > > itself. > > > If so, it requires driver not using such static buffers (e.g., from kmalloc) to do > > DMA even if the buffer is continuous in native. > > > > I am bit loss here. > > > > Is the issue you found only with drivers that do not use DMA API? > > Can you perhaps point me to the code that triggered this fix in the first place? > > Yes, we met this issue on a specific SAS device/driver, and it calls into libata-core code, you can refer to function ata_dev_read_id() called from ata_dev_reread_id() in drivers/ata/libata-core.c. > > In the above function, the target buffer is (void *)dev->link->ap->sector_buf, which is 512 bytes static buffer and unfortunately it across the page boundary.Hm, that looks like a DMA API violation. If you ran the code with CONFIG_DEBUG_DMA_API did it complain about this? I recall the floppy driver doing something similar and the Debug DMA API was quite verbose in pointing this out.> > > > Is this acceptable by kernel/driver upstream? > > > > I am still not completely clear on what you had in mind. The one method I > > thought about that might help in this is to have Xen-SWIOTLB track which > > memory ranges were exchanged (so xen_swiotlb_fixup would save the *buf > > and the size for each call to xen_create_contiguous_region in a list or array). > > > > When xen_swiotlb_map/xen_swiotlb_map_sg_attrs care called they would > > consult said array/list to see if the region they retrieved crosses said 2MB > > chunks. If so.. and here I am unsure of what would be the best way to proceed. > > We thought we can solve the issue in several ways: > > 1) Like the previous patch I sent out, we check the DMA region in xen_swiotlb_map_page() and xen_swiotlb_map_sg_attr(), and if DMA region crosses page boundary, we exchange the memory and copy the content. However it has race condition that when copying the memory content (we introduced two memory copies in the patch), some other code may also visit the page, which may encounter incorrect values. > 2) Mostly the same as item 1), the only difference is that we put the memory content copy inside Xen hypervisor but not in Dom0. This requires we add certain flags to indicating memory moving in the XENMEM_exchange hypercall. > 3) As you also mentioned, this is not a common case, it is only triggered by some specific devices/drivers. we can fix it in certain driver to avoid DMA into static buffers. Like (void *)dev->link->ap->sector_buf in the above case. But I am not sure whether it is acceptable by kernel/driver upstream. > > Thanks, > Dongxiao > >
Konrad Rzeszutek Wilk
2012-Dec-19 20:09 UTC
Re: [PATCH] xen/swiotlb: Exchange to contiguous memory for map_sg hook
On Wed, Dec 12, 2012 at 09:38:23AM +0000, Jan Beulich wrote:> >>> On 12.12.12 at 02:03, "Xu, Dongxiao" <dongxiao.xu@intel.com> wrote: > >> From: Konrad Rzeszutek Wilk [mailto:konrad.wilk@oracle.com] > >> On Tue, Dec 11, 2012 at 06:39:35AM +0000, Xu, Dongxiao wrote: > >> > > From: Konrad Rzeszutek Wilk [mailto:konrad.wilk@oracle.com] > >> > > What if this check was done in the routines that provide the > >> > > software static buffers and there try to provide a nice DMA contingous > >> swatch of pages? > >> > > >> > Yes, this approach also came to our mind, which needs to modify the driver > >> itself. > >> > If so, it requires driver not using such static buffers (e.g., from > > kmalloc) to do > >> DMA even if the buffer is continuous in native. > >> > >> I am bit loss here. > >> > >> Is the issue you found only with drivers that do not use DMA API? > >> Can you perhaps point me to the code that triggered this fix in the first > > place? > > > > Yes, we met this issue on a specific SAS device/driver, and it calls into > > libata-core code, you can refer to function ata_dev_read_id() called from > > ata_dev_reread_id() in drivers/ata/libata-core.c. > > > > In the above function, the target buffer is (void *)dev->link->ap->sector_buf, > > which is 512 bytes static buffer and unfortunately it across the page > > boundary. > > I wonder whether such use of sg_init_one()/sg_set_buf() is correct > in the first place. While there aren''t any restrictions documented for > its use, one clearly can''t pass in whatever one wants (a pointer into > vmalloc()-ed memory, for instance, won''t work afaict). > > I didn''t go through all other users of it, but quite a few of the uses > elsewhere look similarly questionable. > > >> I am still not completely clear on what you had in mind. The one method I > >> thought about that might help in this is to have Xen-SWIOTLB track which > >> memory ranges were exchanged (so xen_swiotlb_fixup would save the *buf > >> and the size for each call to xen_create_contiguous_region in a list or > > array). > >> > >> When xen_swiotlb_map/xen_swiotlb_map_sg_attrs care called they would > >> consult said array/list to see if the region they retrieved crosses said 2MB > >> chunks. If so.. and here I am unsure of what would be the best way to > > proceed.And from finally looking at the code I misunderstood your initial description.> > > > We thought we can solve the issue in several ways: > > > > 1) Like the previous patch I sent out, we check the DMA region in > > xen_swiotlb_map_page() and xen_swiotlb_map_sg_attr(), and if DMA region > > crosses page boundary, we exchange the memory and copy the content. However > > it has race condition that when copying the memory content (we introduced two > > memory copies in the patch), some other code may also visit the page, which > > may encounter incorrect values. > > That''s why, after mapping a buffer (or SG list) one has to call the > sync functions before looking at data. Any race as described by > you is therefore a programming error.I am with Jan here. It looks like the libata is depending on the dma_unmap to do this. But it is unclear to me when the ata_qc_complete is actually called to unmap the buffer (and hence sync it).> > Jan > >
Xu, Dongxiao
2012-Dec-20 01:23 UTC
Re: [PATCH] xen/swiotlb: Exchange to contiguous memory for map_sg hook
> -----Original Message----- > From: Konrad Rzeszutek Wilk [mailto:konrad.wilk@oracle.com] > Sent: Thursday, December 20, 2012 4:09 AM > To: Jan Beulich > Cc: Xu, Dongxiao; xen-devel@lists.xen.org; linux-kernel@vger.kernel.org > Subject: Re: [Xen-devel] [PATCH] xen/swiotlb: Exchange to contiguous memory > for map_sg hook > > On Wed, Dec 12, 2012 at 09:38:23AM +0000, Jan Beulich wrote: > > >>> On 12.12.12 at 02:03, "Xu, Dongxiao" <dongxiao.xu@intel.com> wrote: > > >> From: Konrad Rzeszutek Wilk [mailto:konrad.wilk@oracle.com] On Tue, > > >> Dec 11, 2012 at 06:39:35AM +0000, Xu, Dongxiao wrote: > > >> > > From: Konrad Rzeszutek Wilk [mailto:konrad.wilk@oracle.com] > > >> > > What if this check was done in the routines that provide the > > >> > > software static buffers and there try to provide a nice DMA > > >> > > contingous > > >> swatch of pages? > > >> > > > >> > Yes, this approach also came to our mind, which needs to modify > > >> > the driver > > >> itself. > > >> > If so, it requires driver not using such static buffers (e.g., > > >> > from > > > kmalloc) to do > > >> DMA even if the buffer is continuous in native. > > >> > > >> I am bit loss here. > > >> > > >> Is the issue you found only with drivers that do not use DMA API? > > >> Can you perhaps point me to the code that triggered this fix in the > > >> first > > > place? > > > > > > Yes, we met this issue on a specific SAS device/driver, and it calls > > > into libata-core code, you can refer to function ata_dev_read_id() > > > called from > > > ata_dev_reread_id() in drivers/ata/libata-core.c. > > > > > > In the above function, the target buffer is (void > > > *)dev->link->ap->sector_buf, which is 512 bytes static buffer and > > > unfortunately it across the page boundary. > > > > I wonder whether such use of sg_init_one()/sg_set_buf() is correct in > > the first place. While there aren''t any restrictions documented for > > its use, one clearly can''t pass in whatever one wants (a pointer into > > vmalloc()-ed memory, for instance, won''t work afaict). > > > > I didn''t go through all other users of it, but quite a few of the uses > > elsewhere look similarly questionable. > > > > >> I am still not completely clear on what you had in mind. The one > > >> method I thought about that might help in this is to have > > >> Xen-SWIOTLB track which memory ranges were exchanged (so > > >> xen_swiotlb_fixup would save the *buf and the size for each call to > > >> xen_create_contiguous_region in a list or > > > array). > > >> > > >> When xen_swiotlb_map/xen_swiotlb_map_sg_attrs care called they > > >> would consult said array/list to see if the region they retrieved > > >> crosses said 2MB chunks. If so.. and here I am unsure of what would > > >> be the best way to > > > proceed. > > And from finally looking at the code I misunderstood your initial description. > > > > > > > We thought we can solve the issue in several ways: > > > > > > 1) Like the previous patch I sent out, we check the DMA region in > > > xen_swiotlb_map_page() and xen_swiotlb_map_sg_attr(), and if DMA > > > region crosses page boundary, we exchange the memory and copy the > > > content. However it has race condition that when copying the memory > > > content (we introduced two memory copies in the patch), some other > > > code may also visit the page, which may encounter incorrect values. > > > > That''s why, after mapping a buffer (or SG list) one has to call the > > sync functions before looking at data. Any race as described by you is > > therefore a programming error. > > I am with Jan here. It looks like the libata is depending on the dma_unmap to > do this. But it is unclear to me when the ata_qc_complete is actually called to > unmap the buffer (and hence sync it).Sorry, maybe I am still not describing this issue clearly. Take the libata case as an example, the static DMA buffer locates (dev->link->ap->sector_buf , here we use Data Structure B in the graph) in the following structure: -------------------------------------Page boundary <Data Structure A> <Data Structure B> -------------------------------------Page boundary <Data Structure B (cross page)> <Data Structure C> -------------------------------------Page boundary Where Structure B is our DMA target. For Data Structure B, we didn''t care about the simultaneous access, either lock or sync function will take care of it. What we are not sure is "read/write of A and C from other processor". As we will have memory copy for the pages, and at the same time, other CPU may access A/C. Thanks, Dongxiao> > > > > Jan > > > >
Jan Beulich
2012-Dec-20 08:56 UTC
Re: [PATCH] xen/swiotlb: Exchange to contiguous memory for map_sg hook
>>> On 20.12.12 at 02:23, "Xu, Dongxiao" <dongxiao.xu@intel.com> wrote: > Sorry, maybe I am still not describing this issue clearly.No, at least I understood you the way you re-describe below.> Take the libata case as an example, the static DMA buffer locates > (dev->link->ap->sector_buf , here we use Data Structure B in the graph) in > the following structure: > > -------------------------------------Page boundary > <Data Structure A> > <Data Structure B> > -------------------------------------Page boundary > <Data Structure B (cross page)> > <Data Structure C> > -------------------------------------Page boundary > > Where Structure B is our DMA target. > > For Data Structure B, we didn''t care about the simultaneous access, either > lock or sync function will take care of it. > What we are not sure is "read/write of A and C from other processor". As we > will have memory copy for the pages, and at the same time, other CPU may > access A/C.The question is whether what libata does here is valid in the first place - fill an SG list entry with something that crosses a page boundary and is not a compound page. Jan