Alexandre Courbot
2014-Jun-27 11:22 UTC
[Nouveau] [PATCH v3 0/2] drm: nouveau: memory coherency for ARM
v2 was doing some pretty nasty things with the DMA API, so I took a different approach for this v3. As suggested, this version uses ttm_dma_populate() to populate BOs. The reason for doing this was that it would entitle us to using the DMA sync functions, but since the memory returned is already coherent anyway, we do not even need to call these functions anymore. So this series has turned into 2 small patches: - The first attempts to make it more obvious that Nouveau can use different ways to populate TTs, and make it possible to choose which method to use from a single place. - The second leverages this work to select the DMA allocator to populate TTs on ARM. Doing this solves all our coherency problems with Nouveau on Tegra, and hopefully makes the code easier to read in the process. Alexandre Courbot (2): drm/nouveau: cleanup TTM population logic drm/nouveau: use DMA TT population method on ARM drivers/gpu/drm/nouveau/nouveau_bo.c | 63 ++++++++++++++++++----------------- drivers/gpu/drm/nouveau/nouveau_drm.h | 11 ++++++ drivers/gpu/drm/nouveau/nouveau_ttm.c | 17 ++++++++++ 3 files changed, 61 insertions(+), 30 deletions(-) -- 2.0.0
Alexandre Courbot
2014-Jun-27 11:22 UTC
[Nouveau] [PATCH v3 1/2] drm/nouveau: cleanup TTM population logic
TTM pages can currently be populated using 3 different methods, but the code does not make it clear which one is used. The same complex conditions are tested in various parts of the code, which makes it difficult to follow and error prone. This patch introduces an enum into the nouveau_drm struct that indicates which population method will be used for TT-backed BOs. It is set once and for all during TTM initialization so the same conditions need not be tested again and again. This will also allow us to add new non-default TTM population cases by changing a single piece of code in nouveau_ttm_init() instead of all over nouveau_bo.c Signed-off-by: Alexandre Courbot <acourbot at nvidia.com> --- drivers/gpu/drm/nouveau/nouveau_bo.c | 63 ++++++++++++++++++----------------- drivers/gpu/drm/nouveau/nouveau_drm.h | 11 ++++++ drivers/gpu/drm/nouveau/nouveau_ttm.c | 15 +++++++++ 3 files changed, 59 insertions(+), 30 deletions(-) diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c b/drivers/gpu/drm/nouveau/nouveau_bo.c index 4db886f9f793..82c6b479abcb 100644 --- a/drivers/gpu/drm/nouveau/nouveau_bo.c +++ b/drivers/gpu/drm/nouveau/nouveau_bo.c @@ -28,7 +28,6 @@ */ #include <core/engine.h> -#include <linux/swiotlb.h> #include <subdev/fb.h> #include <subdev/vm.h> @@ -467,11 +466,11 @@ static struct ttm_tt * nouveau_ttm_tt_create(struct ttm_bo_device *bdev, unsigned long size, uint32_t page_flags, struct page *dummy_read) { -#if __OS_HAS_AGP +#if defined(TTM_HAS_AGP) struct nouveau_drm *drm = nouveau_bdev(bdev); struct drm_device *dev = drm->dev; - if (drm->agp.stat == ENABLED) { + if (drm->ttm.populate_method == AGP) { return ttm_agp_tt_create(bdev, dev->agp->bridge, size, page_flags, dummy_read); } @@ -524,21 +523,23 @@ nouveau_bo_init_mem_type(struct ttm_bo_device *bdev, uint32_t type, if (nv_device(drm->device)->card_type >= NV_50) man->func = &nouveau_gart_manager; else - if (drm->agp.stat != ENABLED) + if (drm->ttm.populate_method != AGP) man->func = &nv04_gart_manager; else man->func = &ttm_bo_manager_func; - if (drm->agp.stat == ENABLED) { - man->flags = TTM_MEMTYPE_FLAG_MAPPABLE; + man->flags = TTM_MEMTYPE_FLAG_MAPPABLE; + + if (drm->ttm.populate_method != AGP) + man->flags |= TTM_MEMTYPE_FLAG_CMA; + + if (drm->ttm.populate_method == POOL) { + man->available_caching = TTM_PL_MASK_CACHING; + man->default_caching = TTM_PL_FLAG_CACHED; + } else { man->available_caching = TTM_PL_FLAG_UNCACHED | TTM_PL_FLAG_WC; man->default_caching = TTM_PL_FLAG_WC; - } else { - man->flags = TTM_MEMTYPE_FLAG_MAPPABLE | - TTM_MEMTYPE_FLAG_CMA; - man->available_caching = TTM_PL_MASK_CACHING; - man->default_caching = TTM_PL_FLAG_CACHED; } break; @@ -1249,13 +1250,11 @@ nouveau_ttm_io_mem_reserve(struct ttm_bo_device *bdev, struct ttm_mem_reg *mem) /* System memory */ return 0; case TTM_PL_TT: -#if __OS_HAS_AGP - if (drm->agp.stat == ENABLED) { + if (drm->ttm.populate_method == AGP) { mem->bus.offset = mem->start << PAGE_SHIFT; mem->bus.base = drm->agp.base; mem->bus.is_iomem = !dev->agp->cant_use_aperture; } -#endif if (nv_device(drm->device)->card_type < NV_50 || !node->memtype) /* untiled */ break; @@ -1359,17 +1358,20 @@ nouveau_ttm_tt_populate(struct ttm_tt *ttm) device = nv_device(drm->device); dev = drm->dev; -#if __OS_HAS_AGP - if (drm->agp.stat == ENABLED) { + switch (drm->ttm.populate_method) { + case AGP: +#if defined(TTM_HAS_AGP) return ttm_agp_tt_populate(ttm); - } +#else + return 0; #endif + case DMA: + return ttm_dma_populate(ttm_dma, dev->dev); -#ifdef CONFIG_SWIOTLB - if (swiotlb_nr_tbl()) { - return ttm_dma_populate((void *)ttm, dev->dev); + case POOL: + /* POOL case is handled below */ + break; } -#endif r = ttm_pool_populate(ttm); if (r) { @@ -1409,19 +1411,20 @@ nouveau_ttm_tt_unpopulate(struct ttm_tt *ttm) device = nv_device(drm->device); dev = drm->dev; -#if __OS_HAS_AGP - if (drm->agp.stat == ENABLED) { + switch (drm->ttm.populate_method) { + case AGP: +#if defined(TTM_HAS_AGP) ttm_agp_tt_unpopulate(ttm); - return; - } #endif - -#ifdef CONFIG_SWIOTLB - if (swiotlb_nr_tbl()) { - ttm_dma_unpopulate((void *)ttm, dev->dev); return; + case DMA: + ttm_dma_unpopulate(ttm_dma, dev->dev); + return; + + case POOL: + /* POOL case is handled below */ + break; } -#endif for (i = 0; i < ttm->num_pages; i++) { if (ttm_dma->dma_address[i]) { diff --git a/drivers/gpu/drm/nouveau/nouveau_drm.h b/drivers/gpu/drm/nouveau/nouveau_drm.h index 7efbafaf7c1d..0d4b9eff8863 100644 --- a/drivers/gpu/drm/nouveau/nouveau_drm.h +++ b/drivers/gpu/drm/nouveau/nouveau_drm.h @@ -103,6 +103,17 @@ struct nouveau_drm { struct ttm_mem_reg *, struct ttm_mem_reg *); struct nouveau_channel *chan; int mtrr; + /* + * How TTM objects are populated: + * POOL: use ttm_pool_populate() + * AGP: use ttm_agp_tt_populate() + * DMA: use ttm_dma_populate() + */ + enum { + POOL, + AGP, + DMA, + } populate_method; } ttm; /* GEM interface support */ diff --git a/drivers/gpu/drm/nouveau/nouveau_ttm.c b/drivers/gpu/drm/nouveau/nouveau_ttm.c index ab0228f640a5..70893416d4b6 100644 --- a/drivers/gpu/drm/nouveau/nouveau_ttm.c +++ b/drivers/gpu/drm/nouveau/nouveau_ttm.c @@ -24,6 +24,10 @@ * USE OR OTHER DEALINGS IN THE SOFTWARE. */ +#include <linux/dma-mapping.h> +#include <linux/swiotlb.h> +#include <linux/kconfig.h> + #include <subdev/fb.h> #include <subdev/vm.h> #include <subdev/instmem.h> @@ -375,6 +379,17 @@ nouveau_ttm_init(struct nouveau_drm *drm) DMA_BIT_MASK(32)); } + drm->ttm.populate_method = POOL; + +#ifdef CONFIG_SWIOTLB + if (swiotlb_nr_tbl()) + drm->ttm.populate_method = DMA; +#endif +#if defined(TTM_HAS_AGP) + if (drm->agp.stat == ENABLED) + drm->ttm.populate_method = AGP; +#endif + ret = nouveau_ttm_global_init(drm); if (ret) return ret; -- 2.0.0
Alexandre Courbot
2014-Jun-27 11:22 UTC
[Nouveau] [PATCH v3 2/2] drm/nouveau: use DMA TT population method on ARM
Cached memory accesses between the CPU and the GPU are not coherent on ARM. Use the DMA TTM allocator on this architecture to obtain coherent memory. Signed-off-by: Alexandre Courbot <acourbot at nvidia.com> --- drivers/gpu/drm/nouveau/nouveau_ttm.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/gpu/drm/nouveau/nouveau_ttm.c b/drivers/gpu/drm/nouveau/nouveau_ttm.c index 70893416d4b6..d77fc8a5167d 100644 --- a/drivers/gpu/drm/nouveau/nouveau_ttm.c +++ b/drivers/gpu/drm/nouveau/nouveau_ttm.c @@ -385,6 +385,8 @@ nouveau_ttm_init(struct nouveau_drm *drm) if (swiotlb_nr_tbl()) drm->ttm.populate_method = DMA; #endif + if (IS_ENABLED(CONFIG_ARM)) + drm->ttm.populate_method = DMA; #if defined(TTM_HAS_AGP) if (drm->agp.stat == ENABLED) drm->ttm.populate_method = AGP; -- 2.0.0
Alexandre Courbot
2014-Jul-03 10:05 UTC
[Nouveau] [PATCH v3 0/2] drm: nouveau: memory coherency for ARM
While this series is more correct from the DMA API point of view, it is also much heavier as it strictly disables the use of any cache on all user-space mapped BOs, and is also much more restricted in terms of which memory it can use. I have a v4 in the works that lets us use TTM for user-mapped objects and only falls back to using the DMA API for kernel-only objects (fences & PBs basically). Will push it shortly but wanted to signal this revision as deprecated in the meantime. On Fri, Jun 27, 2014 at 8:22 PM, Alexandre Courbot <acourbot at nvidia.com> wrote:> v2 was doing some pretty nasty things with the DMA API, so I took a different > approach for this v3. > > As suggested, this version uses ttm_dma_populate() to populate BOs. The reason > for doing this was that it would entitle us to using the DMA sync functions, > but since the memory returned is already coherent anyway, we do not even > need to call these functions anymore. > > So this series has turned into 2 small patches: > > - The first attempts to make it more obvious that Nouveau can use different > ways to populate TTs, and make it possible to choose which method to use from > a single place. > - The second leverages this work to select the DMA allocator to populate TTs > on ARM. > > Doing this solves all our coherency problems with Nouveau on Tegra, and > hopefully makes the code easier to read in the process. > > Alexandre Courbot (2): > drm/nouveau: cleanup TTM population logic > drm/nouveau: use DMA TT population method on ARM > > drivers/gpu/drm/nouveau/nouveau_bo.c | 63 ++++++++++++++++++----------------- > drivers/gpu/drm/nouveau/nouveau_drm.h | 11 ++++++ > drivers/gpu/drm/nouveau/nouveau_ttm.c | 17 ++++++++++ > 3 files changed, 61 insertions(+), 30 deletions(-) > > -- > 2.0.0 >
Maybe Matching Threads
- [PATCH v5 0/4] drm: nouveau: memory coherency on ARM
- [RFC] Remove AGP support from Radeon/Nouveau/TTM
- [PATCH v4 0/6] drm: nouveau: memory coherency on ARM
- [PATCH v2] drm/nouveau: support for platform devices
- [PATCH 3/4] drm/nouveau: stop using is_swiotlb_active