Alexandre Courbot
2015-Feb-26 03:44 UTC
[Nouveau] [PATCH] gem: allow user-space to specify an object should be coherent
User-space use mappable BOs notably for fences, and expects that a value update by the GPU will be immediatly visible through the user-space mapping. ARM has a property that may prevent this from happening though: memory can be mapped multiple times only if the different mappings share the same caching properties. However all the lowmem memory is already identity-mapped into the kernel with cache enabled, so when user-space requests an uncached mapping, we actually get an "undefined caching policy" one and this has strange side-effects described on Freedesktop bug 86690. To prevent this from happening, allow user-space to explicitly specify which objects should be coherent, and create such objects with the TTM_PL_FLAG_UNCACHED flag. This will make TTM allocate memory using the DMA API, which will fix the identify mapping and allow us to safely map the objects to user-space uncached. Signed-off-by: Alexandre Courbot <acourbot at nvidia.com> --- Patches that take advantage of this in Mesa will follow up shortly. I'd to make sure the new flag is ok first before also adding it to libdrm. drm/nouveau/include/uapi/drm/nouveau_drm.h | 1 + drm/nouveau/nouveau_gem.c | 3 +++ 2 files changed, 4 insertions(+) diff --git a/drm/nouveau/include/uapi/drm/nouveau_drm.h b/drm/nouveau/include/uapi/drm/nouveau_drm.h index 0d7608dc1a34..5507eead5863 100644 --- a/drm/nouveau/include/uapi/drm/nouveau_drm.h +++ b/drm/nouveau/include/uapi/drm/nouveau_drm.h @@ -39,6 +39,7 @@ #define NOUVEAU_GEM_DOMAIN_VRAM (1 << 1) #define NOUVEAU_GEM_DOMAIN_GART (1 << 2) #define NOUVEAU_GEM_DOMAIN_MAPPABLE (1 << 3) +#define NOUVEAU_GEM_DOMAIN_COHERENT (1 << 4) #define NOUVEAU_GEM_TILE_COMP 0x00030000 /* nv50-only */ #define NOUVEAU_GEM_TILE_LAYOUT_MASK 0x0000ff00 diff --git a/drm/nouveau/nouveau_gem.c b/drm/nouveau/nouveau_gem.c index 7c077fced1d1..0e690bf19fc9 100644 --- a/drm/nouveau/nouveau_gem.c +++ b/drm/nouveau/nouveau_gem.c @@ -189,6 +189,9 @@ nouveau_gem_new(struct drm_device *dev, int size, int align, uint32_t domain, if (!flags || domain & NOUVEAU_GEM_DOMAIN_CPU) flags |= TTM_PL_FLAG_SYSTEM; + if (domain & NOUVEAU_GEM_DOMAIN_COHERENT) + flags |= TTM_PL_FLAG_UNCACHED; + ret = nouveau_bo_new(dev, size, align, flags, tile_mode, tile_flags, NULL, NULL, pnvbo); if (ret) -- 2.3.0
Alexandre Courbot
2015-Feb-26 03:44 UTC
[Nouveau] [PATCH] instmem/gk20a: use roundup() macro
Use the roundup() macro to make code easier to read and fix a warning when the driver is compiled for 64 bit architectures. Signed-off-by: Alexandre Courbot <acourbot at nvidia.com> --- Ben, this should probably be squashed into patch 6/6 of my "RAM device removal & IOMMU support" series, since it is not merged yet. drm/nouveau/nvkm/subdev/instmem/gk20a.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drm/nouveau/nvkm/subdev/instmem/gk20a.c b/drm/nouveau/nvkm/subdev/instmem/gk20a.c index a31196b6da8f..fcba72eb74a3 100644 --- a/drm/nouveau/nvkm/subdev/instmem/gk20a.c +++ b/drm/nouveau/nvkm/subdev/instmem/gk20a.c @@ -335,8 +335,8 @@ gk20a_instobj_ctor(struct nvkm_object *parent, struct nvkm_object *engine, priv->domain ? "IOMMU" : "DMA", args->size, args->align); /* Round size and align to page bounds */ - size = max((args->size + ~PAGE_MASK) & PAGE_MASK, (u32)PAGE_SIZE); - align = max((args->align + ~PAGE_MASK) & PAGE_MASK, (u32)PAGE_SIZE); + size = max(roundup(args->size, PAGE_SIZE), PAGE_SIZE); + align = max(roundup(args->align, PAGE_SIZE), PAGE_SIZE); if (priv->domain) ret = gk20a_instobj_ctor_iommu(parent, engine, oclass, -- 2.3.0
Lucas Stach
2015-Feb-26 08:36 UTC
[Nouveau] [PATCH] gem: allow user-space to specify an object should be coherent
Am Donnerstag, den 26.02.2015, 12:44 +0900 schrieb Alexandre Courbot:> User-space use mappable BOs notably for fences, and expects that a > value update by the GPU will be immediatly visible through the > user-space mapping. > > ARM has a property that may prevent this from happening though: memory > can be mapped multiple times only if the different mappings share the > same caching properties. However all the lowmem memory is already > identity-mapped into the kernel with cache enabled, so when user-space > requests an uncached mapping, we actually get an "undefined caching > policy" one and this has strange side-effects described on Freedesktop > bug 86690. > > To prevent this from happening, allow user-space to explicitly specify > which objects should be coherent, and create such objects with the > TTM_PL_FLAG_UNCACHED flag. This will make TTM allocate memory using the > DMA API, which will fix the identify mapping and allow us to safely map > the objects to user-space uncached. > > Signed-off-by: Alexandre Courbot <acourbot at nvidia.com>Ok, this is only needed as userspace is skipping the cpu_prep for the fence BO reads. As doing this would increase the userspace fence overhead a lot, this flag seems to be the right thing to do. Reviewed-by: Lucas Stach <dev at lynxeye.de>> --- > Patches that take advantage of this in Mesa will follow up shortly. I'd > to make sure the new flag is ok first before also adding it to libdrm. > > drm/nouveau/include/uapi/drm/nouveau_drm.h | 1 + > drm/nouveau/nouveau_gem.c | 3 +++ > 2 files changed, 4 insertions(+) > > diff --git a/drm/nouveau/include/uapi/drm/nouveau_drm.h b/drm/nouveau/include/uapi/drm/nouveau_drm.h > index 0d7608dc1a34..5507eead5863 100644 > --- a/drm/nouveau/include/uapi/drm/nouveau_drm.h > +++ b/drm/nouveau/include/uapi/drm/nouveau_drm.h > @@ -39,6 +39,7 @@ > #define NOUVEAU_GEM_DOMAIN_VRAM (1 << 1) > #define NOUVEAU_GEM_DOMAIN_GART (1 << 2) > #define NOUVEAU_GEM_DOMAIN_MAPPABLE (1 << 3) > +#define NOUVEAU_GEM_DOMAIN_COHERENT (1 << 4) > > #define NOUVEAU_GEM_TILE_COMP 0x00030000 /* nv50-only */ > #define NOUVEAU_GEM_TILE_LAYOUT_MASK 0x0000ff00 > diff --git a/drm/nouveau/nouveau_gem.c b/drm/nouveau/nouveau_gem.c > index 7c077fced1d1..0e690bf19fc9 100644 > --- a/drm/nouveau/nouveau_gem.c > +++ b/drm/nouveau/nouveau_gem.c > @@ -189,6 +189,9 @@ nouveau_gem_new(struct drm_device *dev, int size, int align, uint32_t domain, > if (!flags || domain & NOUVEAU_GEM_DOMAIN_CPU) > flags |= TTM_PL_FLAG_SYSTEM; > > + if (domain & NOUVEAU_GEM_DOMAIN_COHERENT) > + flags |= TTM_PL_FLAG_UNCACHED; > + > ret = nouveau_bo_new(dev, size, align, flags, tile_mode, > tile_flags, NULL, NULL, pnvbo); > if (ret)