Konrad Rzeszutek Wilk
2009-Oct-01 17:35 UTC
[Xen-devel] Re: ATI radeon fails with "iommu=soft swiotlb=force" (seen on RV730/RV740 and RS780/RS800)
On Wed, Sep 30, 2009 at 11:21:49AM -0400, Konrad Rzeszutek Wilk wrote:> > With those two options defined in the boot line, the radeon kernel driver spits out: > > [ 244.190885] [drm] Loading RV730/RV740 PFP Microcode > [ 244.190911] [drm] Loading RV730/RV740 CP Microcode > [ 244.205974] [drm] Resetting GPU > [ 244.310103] [drm] writeback test failed > [ 251.220092] [drm] Resetting GPU > > And the video is useless (red little characters at the top, nothing else). > If I don''t define those two arguments the video driver works fine... snip ..> > Right now I am instrumenting the code and trying to narrow > down the issue. But I was wondering if somebody has seen this in the past > and can say "Oh yeah, I saw that back in 1980, try this."I found the fault, which I am going to try to describe. The radeon_dri driver (user-land) via ioctl, calls ''drm_sg_alloc''. ''drm_sg_alloc'' allocates a 32MB area using ''vmalloc_32''. On non-Xen machines, the physical address is indeed under the 4GB number. On Xen thought the physical address is somewhere in the stratosphere. The radeon_dri saves this vmalloc-ed virtual address in its structures. Next step is bit more complex. radeon_dri via ioctl calls the radeon_cp_init which job is to initialize the command processing engine. The passed in arguments are the virtual address obtained via ''drm_sg_alloc''. That 32MB at that point is partitioned (the first 1MB is for the command processing ring, the next 64KB for a ring pointer, and so on) and both the user-land driver and the kernel save this address in their structures. Keep in mind that this address is the virtual address pointing to the vmalloc-ed area. Not the virtual address obtained from page_address(vmalloc_to_page(i))). Then the radeon_cp_init sets up a GPU page-table using the physical addresses. This is where it starts to fall apart, as during this setup, the r600_page_table_init calls the pci_map_page to ensure that the physical address is below 4GB. On Xen (and on non-Xen with swiotlb=force iommu=soft), the physical addresses obtained end up being taken from the IOMMU. This is important. After that the writeback is done on the MMIO region, the GPU happily looks at the page-table, finds the physicall address (which is _not_ pointing to the vmalloc area, but to the IOMMU) and does a write to that memory region. While the writeback test reads data from saved ring_rptr (which reads memory from the vmalloc area - for which the GPU has no page-table). Solutions?: 1). The current workaround is to boot Xen with mem=3GB which will enforce the vmalloc area is under 4GB. 2). I experiemented a bit with drm_sg_alloc to use the dma_alloc_coherent but 32MB is way too big for it: if (order >= MAX_ORDER) { WARN_ON_ONCE(!(gfp_mask & __GFP_NOWARN)); (MAX_ORDER is defined to be 11, and 32MB is order 15). 3). Restructure r600_cp_init to refresh its structures when the page table uses different areas than the physical addresses. That would imply keeping a copy of the "old" vmalloc unused virtual addresses (the user-land driver uses the virtual address as handle) and updating the ring pointer, command pointer, etc to use the virtual addresses obtained from doing phys_to_virt(page_to_phys(entry->pagelist[X])). But the code assumes (and righly) that the area obtained from drm_sg_alloc is 32-bit addressable, so.. 3). Ensuring the sg_drm_alloc will get an area with physical addresses under the 4G mark. I was thinking to utlilize the xen_create_contigous_region, but the PFNs returned by vmalloc_32 are actually MFNs, so xen_create_contigous_region is out of the picture. 4). Go a further step back. Ensure that vmalloc_32 will always get addresses under the 4GB mark? Since the virt_to_pfn for these addresses are the real physical addresses, perhaps make this function enforce the MFNs to be under 4GB mark? 5). Other suggestions? _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2009-Oct-01 19:07 UTC
[Xen-devel] Re: ATI radeon fails with "iommu=soft swiotlb=force" (seen on RV730/RV740 and RS780/RS800)
On 10/01/09 10:35, Konrad Rzeszutek Wilk wrote:> On Wed, Sep 30, 2009 at 11:21:49AM -0400, Konrad Rzeszutek Wilk wrote: > >> With those two options defined in the boot line, the radeon kernel driver spits out: >> >> [ 244.190885] [drm] Loading RV730/RV740 PFP Microcode >> [ 244.190911] [drm] Loading RV730/RV740 CP Microcode >> [ 244.205974] [drm] Resetting GPU >> [ 244.310103] [drm] writeback test failed >> [ 251.220092] [drm] Resetting GPU >> >> And the video is useless (red little characters at the top, nothing else). >> If I don''t define those two arguments the video driver works fine. >> > .. snip .. > >> Right now I am instrumenting the code and trying to narrow >> down the issue. But I was wondering if somebody has seen this in the past >> and can say "Oh yeah, I saw that back in 1980, try this." >> > I found the fault, which I am going to try to describe. > > The radeon_dri driver (user-land) via ioctl, calls ''drm_sg_alloc''. > ''drm_sg_alloc'' allocates a 32MB area using ''vmalloc_32''. On non-Xen > machines, the physical address is indeed under the 4GB number. On Xen > thought the physical address is somewhere in the stratosphere. The radeon_dri > saves this vmalloc-ed virtual address in its structures. > > Next step is bit more complex. radeon_dri via ioctl calls the radeon_cp_init > which job is to initialize the command processing engine. The passed > in arguments are the virtual address obtained via ''drm_sg_alloc''. That 32MB > at that point is partitioned (the first 1MB is for the command processing ring, > the next 64KB for a ring pointer, and so on) and both the user-land driver > and the kernel save this address in their structures. Keep in mind that this > address is the virtual address pointing to the vmalloc-ed area. Not the > virtual address obtained from page_address(vmalloc_to_page(i))). > > Then the radeon_cp_init sets up a GPU page-table using the physical > addresses. This is where it starts to fall apart, as during this > setup, the r600_page_table_init calls the pci_map_page to ensure that the > physical address is below 4GB. On Xen (and on non-Xen with swiotlb=force > iommu=soft), the physical addresses obtained end up being taken > from the IOMMU. This is important. >(By IOMMU, do you mean swiotlb? Or via dma_ops?) There''s no particular reason why this needs to be swiotlb''d. pci_map_page should be calling xen_map_page, which should be calling xen_create_contiguous_region() on it to make sure the underlying memory is phyisically in the right place. The tricky part is making sure all the kernel/vmalloc aliases are dealt with properly (and hope there are no usermode mappings at that point, or we''ll have to start rummaging through rmap).> After that the writeback is done on the MMIO region, the GPU > happily looks at the page-table, finds the physicall address (which > is _not_ pointing to the vmalloc area, but to the IOMMU) and does a > write to that memory region. While the writeback test reads data from > saved ring_rptr (which reads memory from the vmalloc area - for which > the GPU has no page-table). >That means the radeon driver is misusing the DMA API. If the page is mapped to the device you can''t start accessing it from the CPU without unmapping it, or at least syncing it.> Solutions?: > 1). The current workaround is to boot Xen with mem=3GB which will > enforce the vmalloc area is under 4GB. > > 2). I experiemented a bit with drm_sg_alloc to use the dma_alloc_coherent > but 32MB is way too big for it: > > if (order >= MAX_ORDER) { > WARN_ON_ONCE(!(gfp_mask & __GFP_NOWARN)); > > (MAX_ORDER is defined to be 11, and 32MB is order 15). >Could modify drm_vmalloc_dma to do the vmalloc "manually": 1. call __get_vm_area to reserve a chunk of vmalloc address space 2. allocate a bunch of individual pages with dma_alloc_coherent 3. insert them into the vmalloc mapping with map_vm_area That will guarantee a normal-looking vmalloc area with device-friendly pages that subsequent pci_map_page operations will use as-is.> 3). Restructure r600_cp_init to refresh its structures when the page > table uses different areas than the physical addresses. That would > imply keeping a copy of the "old" vmalloc unused virtual addresses > (the user-land driver uses the virtual address as handle) and updating > the ring pointer, command pointer, etc to use the virtual addresses > obtained from doing phys_to_virt(page_to_phys(entry->pagelist[X])). > But the code assumes (and righly) that the area obtained from > drm_sg_alloc is 32-bit addressable, so.. >Ugh. I think we could make the argument that this driver is broken for not syncing its mapped pages before accessing them - but if you''re doing graphics via swiotlb you''re in a world of pain anyway. And we want a more general fix because other drivers are likely to be doing equally dubious things.> 3). Ensuring the sg_drm_alloc will get an area with physical addresses > under the 4G mark. I was thinking to utlilize the xen_create_contigous_region, > but the PFNs returned by vmalloc_32 are actually MFNs, so > xen_create_contigous_region is out of the picture. >Not sure I follow you here. But I think this is the right approach, using the algorithm above.> 4). Go a further step back. Ensure that vmalloc_32 will always get > addresses under the 4GB mark? Since the virt_to_pfn for these addresses > are the real physical addresses, perhaps make this function enforce > the MFNs to be under 4GB mark? >I was thinking along those lines, but its hard to see how to do this without mucking around core vmalloc code. J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2009-Oct-01 19:21 UTC
[Xen-devel] Re: ATI radeon fails with "iommu=soft swiotlb=force" (seen on RV730/RV740 and RS780/RS800)
On Thu, Oct 01, 2009 at 12:07:22PM -0700, Jeremy Fitzhardinge wrote:> On 10/01/09 10:35, Konrad Rzeszutek Wilk wrote: > > On Wed, Sep 30, 2009 at 11:21:49AM -0400, Konrad Rzeszutek Wilk wrote: > > > >> With those two options defined in the boot line, the radeon kernel driver spits out: > >> > >> [ 244.190885] [drm] Loading RV730/RV740 PFP Microcode > >> [ 244.190911] [drm] Loading RV730/RV740 CP Microcode > >> [ 244.205974] [drm] Resetting GPU > >> [ 244.310103] [drm] writeback test failed > >> [ 251.220092] [drm] Resetting GPU > >> > >> And the video is useless (red little characters at the top, nothing else). > >> If I don''t define those two arguments the video driver works fine. > >> > > .. snip .. > > > >> Right now I am instrumenting the code and trying to narrow > >> down the issue. But I was wondering if somebody has seen this in the past > >> and can say "Oh yeah, I saw that back in 1980, try this." > >> > > I found the fault, which I am going to try to describe. > > > > The radeon_dri driver (user-land) via ioctl, calls ''drm_sg_alloc''. > > ''drm_sg_alloc'' allocates a 32MB area using ''vmalloc_32''. On non-Xen > > machines, the physical address is indeed under the 4GB number. On Xen > > thought the physical address is somewhere in the stratosphere. The radeon_dri > > saves this vmalloc-ed virtual address in its structures. > > > > Next step is bit more complex. radeon_dri via ioctl calls the radeon_cp_init > > which job is to initialize the command processing engine. The passed > > in arguments are the virtual address obtained via ''drm_sg_alloc''. That 32MB > > at that point is partitioned (the first 1MB is for the command processing ring, > > the next 64KB for a ring pointer, and so on) and both the user-land driver > > and the kernel save this address in their structures. Keep in mind that this > > address is the virtual address pointing to the vmalloc-ed area. Not the > > virtual address obtained from page_address(vmalloc_to_page(i))). > > > > Then the radeon_cp_init sets up a GPU page-table using the physical > > addresses. This is where it starts to fall apart, as during this > > setup, the r600_page_table_init calls the pci_map_page to ensure that the > > physical address is below 4GB. On Xen (and on non-Xen with swiotlb=force > > iommu=soft), the physical addresses obtained end up being taken > > from the IOMMU. This is important. > > > (By IOMMU, do you mean swiotlb? Or via dma_ops?)dma_ops. On non-Xen I forced it to use the swiotlb.> > There''s no particular reason why this needs to be swiotlb''d.Absolutely. Especially as video driver programs the graphic card page-table to do the translations - it is an IOMMU by itself!> pci_map_page should be calling xen_map_page, which should be calling > xen_create_contiguous_region() on it to make sure the underlying memory > is phyisically in the right place. The tricky part is making sure all > the kernel/vmalloc aliases are dealt with properly (and hope there are > no usermode mappings at that point, or we''ll have to start rummaging > through rmap). > > > After that the writeback is done on the MMIO region, the GPU > > happily looks at the page-table, finds the physicall address (which > > is _not_ pointing to the vmalloc area, but to the IOMMU) and does a > > write to that memory region. While the writeback test reads data from > > saved ring_rptr (which reads memory from the vmalloc area - for which > > the GPU has no page-table). > > > > That means the radeon driver is misusing the DMA API. If the page is > mapped to the device you can''t start accessing it from the CPU without > unmapping it, or at least syncing it.Well, it is a bit schizophrenic. The driver sets up the GPU page-table for the 32-bit compatible DMA space using the pci_map_page, and that is what the GPU uses. But the kernel radeon driver assumes that the array of vmalloc virtual address map to the same exact physicall addresses of the page as what was passed to pci_map_page. In other words, what was passed in to pci_map_page is assumed to do a "return page_to_phys(addr);'' operation and nothing more. This is by the virtue that drm_sg_alloc calls ''vmalloc_32'' which is suppose to give a 32-bit accessible set of memory.> > 2). I experiemented a bit with drm_sg_alloc to use the dma_alloc_coherent > > but 32MB is way too big for it: > > > > if (order >= MAX_ORDER) { > > WARN_ON_ONCE(!(gfp_mask & __GFP_NOWARN)); > > > > (MAX_ORDER is defined to be 11, and 32MB is order 15). > > > > Could modify drm_vmalloc_dma to do the vmalloc "manually": > > 1. call __get_vm_area to reserve a chunk of vmalloc address space > 2. allocate a bunch of individual pages with dma_alloc_coherent > 3. insert them into the vmalloc mapping with map_vm_area > > That will guarantee a normal-looking vmalloc area with device-friendly > pages that subsequent pci_map_page operations will use as-is.That is an idea I hadn''t thought off. Let me try that. .. snip ..> > 3). Ensuring the sg_drm_alloc will get an area with physical addresses > > under the 4G mark. I was thinking to utlilize the xen_create_contigous_region, > > but the PFNs returned by vmalloc_32 are actually MFNs, so > > xen_create_contigous_region is out of the picture. > > > > Not sure I follow you here. But I think this is the right approach, > using the algorithm above.True. Let me try that.> > > 4). Go a further step back. Ensure that vmalloc_32 will always get > > addresses under the 4GB mark? Since the virt_to_pfn for these addresses > > are the real physical addresses, perhaps make this function enforce > > the MFNs to be under 4GB mark? > > > > I was thinking along those lines, but its hard to see how to do this > without mucking around core vmalloc code.The are other users of ''vmalloc_32'' that look like they depend on this memory being under the 4GB mark. Most of them are do video capture through USB - so it probably is limited to only accessing up to 4GB.> > J_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2009-Oct-01 19:37 UTC
[Xen-devel] Re: ATI radeon fails with "iommu=soft swiotlb=force" (seen on RV730/RV740 and RS780/RS800)
On 10/01/09 12:07, Jeremy Fitzhardinge wrote:> Could modify drm_vmalloc_dma to do the vmalloc "manually": > > 1. call __get_vm_area to reserve a chunk of vmalloc address space > 2. allocate a bunch of individual pages with dma_alloc_coherent > 3. insert them into the vmalloc mapping with map_vm_area > > That will guarantee a normal-looking vmalloc area with device-friendly > pages that subsequent pci_map_page operations will use as-is. >Like this (untested): diff --git a/drivers/gpu/drm/drm_scatter.c b/drivers/gpu/drm/drm_scatter.c index c7823c8..73bfa63 100644 --- a/drivers/gpu/drm/drm_scatter.c +++ b/drivers/gpu/drm/drm_scatter.c @@ -32,16 +32,60 @@ */ #include <linux/vmalloc.h> +#include <linux/mm.h> #include "drmP.h" #define DEBUG_SCATTER 0 -static inline void *drm_vmalloc_dma(unsigned long size) +static inline void *drm_vmalloc_dma(struct drm_device *drmdev, unsigned long size) { #if defined(__powerpc__) && defined(CONFIG_NOT_COHERENT_CACHE) return __vmalloc(size, GFP_KERNEL, PAGE_KERNEL | _PAGE_NO_CACHE); #else - return vmalloc_32(size); + struct device *dev = &drmdev->pdev->dev; + struct vm_struct *vma; + struct page **pages; + const int npages = PFN_UP(size); + int i; + + pages = kmalloc(npages * sizeof(*pages), GFP_KERNEL); + if (!pages) + goto out_free_pagearr; + + vma = __get_vm_area(size, VM_ALLOC, VMALLOC_START, VMALLOC_END); + if (!vma) + goto out_release_vma; + + for (i = 0; i < npages; i++) { + dma_addr_t phys; + void *addr; + addr = dma_alloc_coherent(dev, PAGE_SIZE, &phys, GFP_KERNEL); + if (addr == NULL) + goto out_free_pages; + + pages[i] = virt_to_page(addr); + } + + if (map_vm_area(vma, PAGE_KERNEL, &pages)) + goto out_free_pages; + + kfree(pages); + + return vma->addr; + +out_free_pages: + while(i > 0) { + void *addr = page_address(pages[--i]); + dma_free_coherent(dev, PAGE_SIZE, addr, virt_to_bus(addr)); + } + +out_release_vma: + vunmap(vma->addr); + +out_free_pagearr: + kfree(pages); + + return NULL; #endif } @@ -107,7 +151,7 @@ int drm_sg_alloc(struct drm_device *dev, struct drm_scatter_gather * request) } memset((void *)entry->busaddr, 0, pages * sizeof(*entry->busaddr)); - entry->virtual = drm_vmalloc_dma(pages << PAGE_SHIFT); + entry->virtual = drm_vmalloc_dma(dev, pages << PAGE_SHIFT); if (!entry->virtual) { kfree(entry->busaddr); kfree(entry->pagelist); _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2009-Oct-01 20:03 UTC
[Xen-devel] Re: ATI radeon fails with "iommu=soft swiotlb=force" (seen on RV730/RV740 and RS780/RS800)
On 10/01/09 12:21, Konrad Rzeszutek Wilk wrote:> The are other users of ''vmalloc_32'' that look like they depend on this > memory being under the 4GB mark. Most of them are do video capture through > USB - so it probably is limited to only accessing up to 4GB. >Hm. The only sane way I can see of doing that is to make sure all the pages in the DMA32 zone are below 4G. But that''s going to get difficult if you want more than just dom0 to support device access... J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jan Beulich
2009-Oct-02 12:52 UTC
[Xen-devel] Re: ATI radeon fails with "iommu=soft swiotlb=force" (seen on RV730/RV740 and RS780/RS800)
>>> Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> 01.10.09 21:21 >>> >The are other users of ''vmalloc_32'' that look like they depend on this >memory being under the 4GB mark. Most of them are do video capture through >USB - so it probably is limited to only accessing up to 4GB.I just went through all of the users of vmalloc_32(), and more than half of them seem bogus (like some legitimate use of it got cloned many times without really needing all the restrictions that come with this). Of course I can''t verify that I''m right with all of those, so I''m not sure how to proceed with trying to do some clean up here... Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2009-Oct-02 14:44 UTC
Re: [Xen-devel] Re: ATI radeon fails with "iommu=soft swiotlb=force" (seen on RV730/RV740 and RS780/RS800)
On Fri, Oct 02, 2009 at 01:52:46PM +0100, Jan Beulich wrote:> >>> Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> 01.10.09 21:21 >>> > >The are other users of ''vmalloc_32'' that look like they depend on this > >memory being under the 4GB mark. Most of them are do video capture through > >USB - so it probably is limited to only accessing up to 4GB. > > I just went through all of the users of vmalloc_32(), and more than half of > them seem bogus (like some legitimate use of it got cloned many times > without really needing all the restrictions that come with this). Of course > I can''t verify that I''m right with all of those, so I''m not sure how to > proceed with trying to do some clean up here...I actually have some of those devices in the basement or I buy some of them at the MIT Flea. Will dig them up and make sure they work properly. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel