Alexandre Courbot
2014-Oct-27 10:34 UTC
[Nouveau] [PATCH 0/3] nouveau: support for custom VRAM domains
This series is to allow NVIDIA chips with shared memory to operate more efficiently (and to operate at all once we disable VRAM from the kernel driver) by allowing nouveau_screen to specify a domain to use for objects originally allocated into VRAM. If the domain is not overridden, the default NOUVEAU_BO_VRAM is used. A NV_VRAM_DOMAIN() macro is then introduced to be used in place of NOUVEAU_BO_VRAM when allocating objects, so the right domain for the chip is used. Doing so greatly simplifies the equation of memory management on shared- memory devices, since we don't have to simulate non-existent VRAM memory. In that respect it seems to be the right thing to do, and all things taken is not very intrusive. Tested on GK20A with Wayland and several EGL clients running, and working fine. Alexandre Courbot (3): nouveau: support for custom VRAM domains nvc0: use NV_VRAM_DOMAIN() macro gk20a: use NOUVEAU_BO_GART as VRAM domain src/gallium/drivers/nouveau/nouveau_buffer.c | 6 ++---- src/gallium/drivers/nouveau/nouveau_screen.c | 6 ++++++ src/gallium/drivers/nouveau/nouveau_screen.h | 4 ++++ src/gallium/drivers/nouveau/nv50/nv50_miptree.c | 4 ++-- src/gallium/drivers/nouveau/nvc0/nvc0_compute.c | 2 +- src/gallium/drivers/nouveau/nvc0/nvc0_context.c | 4 ++-- src/gallium/drivers/nouveau/nvc0/nvc0_miptree.c | 2 +- src/gallium/drivers/nouveau/nvc0/nvc0_program.c | 8 ++++---- src/gallium/drivers/nouveau/nvc0/nvc0_screen.c | 22 ++++++++++++++++------ .../drivers/nouveau/nvc0/nvc0_shader_state.c | 2 +- .../drivers/nouveau/nvc0/nvc0_state_validate.c | 2 +- src/gallium/drivers/nouveau/nvc0/nvc0_tex.c | 2 +- src/gallium/drivers/nouveau/nvc0/nve4_compute.c | 2 +- 13 files changed, 42 insertions(+), 24 deletions(-) -- 2.1.2
Alexandre Courbot
2014-Oct-27 10:34 UTC
[Nouveau] [PATCH 1/3] nouveau: support for custom VRAM domains
Some NVIDIA chips (e.g. GK20A) do not embed VRAM of their own and have complete shared access to system memory. For these systems, allocating objects in VRAM might lead to unneeded copies and sub-optimal memory management. It will also lead to errors if the kernel does not allow VRAM objects allocations. This patch adds a vram_domain member to struct nouveau_screen that can optionally be initialized to an alternative domain to use for VRAM allocations (most likely, NOUVEAU_BO_GART to allocate everything in system memory). If not initialized, the default NOUVEAU_BO_VRAM domain is used. Code that allocates GPU objects is then expected to use the NV_VRAM_DOMAIN() macro in place of NOUVEAU_BO_VRAM to ensure correct behavior on VRAM-less chips. Signed-off-by: Alexandre Courbot <acourbot at nvidia.com> --- src/gallium/drivers/nouveau/nouveau_screen.c | 6 ++++++ src/gallium/drivers/nouveau/nouveau_screen.h | 4 ++++ 2 files changed, 10 insertions(+) diff --git a/src/gallium/drivers/nouveau/nouveau_screen.c b/src/gallium/drivers/nouveau/nouveau_screen.c index 517978d88588..e4598c3fd24f 100644 --- a/src/gallium/drivers/nouveau/nouveau_screen.c +++ b/src/gallium/drivers/nouveau/nouveau_screen.c @@ -158,6 +158,12 @@ nouveau_screen_init(struct nouveau_screen *screen, struct nouveau_device *dev) size = sizeof(nvc0_data); } + /* + * Set default VRAM domain if not overridden + */ + if (!screen->vram_domain) + screen->vram_domain = NOUVEAU_BO_VRAM; + ret = nouveau_object_new(&dev->object, 0, NOUVEAU_FIFO_CHANNEL_CLASS, data, size, &screen->channel); if (ret) diff --git a/src/gallium/drivers/nouveau/nouveau_screen.h b/src/gallium/drivers/nouveau/nouveau_screen.h index cf06f7e88aa0..30041b271c94 100644 --- a/src/gallium/drivers/nouveau/nouveau_screen.h +++ b/src/gallium/drivers/nouveau/nouveau_screen.h @@ -51,6 +51,8 @@ struct nouveau_screen { boolean hint_buf_keep_sysmem_copy; + unsigned vram_domain; + struct { unsigned profiles_checked; unsigned profiles_present; @@ -94,6 +96,8 @@ struct nouveau_screen { #endif }; +#define NV_VRAM_DOMAIN(screen) ((screen)->vram_domain) + #ifdef NOUVEAU_ENABLE_DRIVER_STATISTICS # define NOUVEAU_DRV_STAT(s, n, v) do { \ (s)->stats.named.n += (v); \ -- 2.1.2
Alexandre Courbot
2014-Oct-27 10:34 UTC
[Nouveau] [PATCH 2/3] nvc0: use NV_VRAM_DOMAIN() macro
Use the newly-introduced NV_VRAM_DOMAIN() macro to support alternative VRAM domains for chips that do not use dedicated video memory. Signed-off-by: Alexandre Courbot <acourbot at nvidia.com> --- src/gallium/drivers/nouveau/nouveau_buffer.c | 6 ++---- src/gallium/drivers/nouveau/nv50/nv50_miptree.c | 4 ++-- src/gallium/drivers/nouveau/nvc0/nvc0_compute.c | 2 +- src/gallium/drivers/nouveau/nvc0/nvc0_context.c | 4 ++-- src/gallium/drivers/nouveau/nvc0/nvc0_miptree.c | 2 +- src/gallium/drivers/nouveau/nvc0/nvc0_program.c | 8 ++++---- src/gallium/drivers/nouveau/nvc0/nvc0_screen.c | 12 ++++++------ src/gallium/drivers/nouveau/nvc0/nvc0_shader_state.c | 2 +- src/gallium/drivers/nouveau/nvc0/nvc0_state_validate.c | 2 +- src/gallium/drivers/nouveau/nvc0/nvc0_tex.c | 2 +- src/gallium/drivers/nouveau/nvc0/nve4_compute.c | 2 +- 11 files changed, 22 insertions(+), 24 deletions(-) diff --git a/src/gallium/drivers/nouveau/nouveau_buffer.c b/src/gallium/drivers/nouveau/nouveau_buffer.c index 49ff100c4ece..4c21691ee671 100644 --- a/src/gallium/drivers/nouveau/nouveau_buffer.c +++ b/src/gallium/drivers/nouveau/nouveau_buffer.c @@ -658,13 +658,11 @@ nouveau_buffer_create(struct pipe_screen *pscreen, switch (buffer->base.usage) { case PIPE_USAGE_DEFAULT: case PIPE_USAGE_IMMUTABLE: - buffer->domain = NOUVEAU_BO_VRAM; - break; case PIPE_USAGE_DYNAMIC: /* For most apps, we'd have to do staging transfers to avoid sync * with this usage, and GART -> GART copies would be suboptimal. */ - buffer->domain = NOUVEAU_BO_VRAM; + buffer->domain = NV_VRAM_DOMAIN(screen); break; case PIPE_USAGE_STAGING: case PIPE_USAGE_STREAM: @@ -676,7 +674,7 @@ nouveau_buffer_create(struct pipe_screen *pscreen, } } else { if (buffer->base.bind & screen->vidmem_bindings) - buffer->domain = NOUVEAU_BO_VRAM; + buffer->domain = NV_VRAM_DOMAIN(screen); else if (buffer->base.bind & screen->sysmem_bindings) buffer->domain = NOUVEAU_BO_GART; diff --git a/src/gallium/drivers/nouveau/nv50/nv50_miptree.c b/src/gallium/drivers/nouveau/nv50/nv50_miptree.c index 1aacaec89acf..ddff508c475c 100644 --- a/src/gallium/drivers/nouveau/nv50/nv50_miptree.c +++ b/src/gallium/drivers/nouveau/nv50/nv50_miptree.c @@ -359,7 +359,7 @@ nv50_miptree_create(struct pipe_screen *pscreen, if (!bo_config.nv50.memtype && (pt->bind & PIPE_BIND_SHARED)) mt->base.domain = NOUVEAU_BO_GART; else - mt->base.domain = NOUVEAU_BO_VRAM; + mt->base.domain = NV_VRAM_DOMAIN(nouveau_screen(pscreen)); bo_flags = mt->base.domain | NOUVEAU_BO_NOSNOOP; if (mt->base.base.bind & (PIPE_BIND_CURSOR | PIPE_BIND_DISPLAY_TARGET)) @@ -401,7 +401,7 @@ nv50_miptree_from_handle(struct pipe_screen *pscreen, FREE(mt); return NULL; } - mt->base.domain = NOUVEAU_BO_VRAM; + mt->base.domain = NV_VRAM_DOMAIN(nouveau_screen(pscreen)); mt->base.address = mt->base.bo->offset; mt->base.base = *templ; diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_compute.c b/src/gallium/drivers/nouveau/nvc0/nvc0_compute.c index ad287a2af6b6..56fc83d3679f 100644 --- a/src/gallium/drivers/nouveau/nvc0/nvc0_compute.c +++ b/src/gallium/drivers/nouveau/nvc0/nvc0_compute.c @@ -57,7 +57,7 @@ nvc0_screen_compute_setup(struct nvc0_screen *screen, return ret; } - ret = nouveau_bo_new(dev, NOUVEAU_BO_VRAM, 0, 1 << 12, NULL, + ret = nouveau_bo_new(dev, NV_VRAM_DOMAIN(&screen->base), 0, 1 << 12, NULL, &screen->parm); if (ret) return ret; diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_context.c b/src/gallium/drivers/nouveau/nvc0/nvc0_context.c index b33a6731a1c0..a0189f528e6f 100644 --- a/src/gallium/drivers/nouveau/nvc0/nvc0_context.c +++ b/src/gallium/drivers/nouveau/nvc0/nvc0_context.c @@ -322,7 +322,7 @@ nvc0_create(struct pipe_screen *pscreen, void *priv) /* add permanently resident buffers to bufctxts */ - flags = NOUVEAU_BO_VRAM | NOUVEAU_BO_RD; + flags = NV_VRAM_DOMAIN(&screen->base) | NOUVEAU_BO_RD; BCTX_REFN_bo(nvc0->bufctx_3d, SCREEN, flags, screen->text); BCTX_REFN_bo(nvc0->bufctx_3d, SCREEN, flags, screen->uniform_bo); @@ -333,7 +333,7 @@ nvc0_create(struct pipe_screen *pscreen, void *priv) BCTX_REFN_bo(nvc0->bufctx_cp, CP_SCREEN, flags, screen->parm); } - flags = NOUVEAU_BO_VRAM | NOUVEAU_BO_RDWR; + flags = NV_VRAM_DOMAIN(&screen->base) | NOUVEAU_BO_RDWR; if (screen->poly_cache) BCTX_REFN_bo(nvc0->bufctx_3d, SCREEN, flags, screen->poly_cache); diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_miptree.c b/src/gallium/drivers/nouveau/nvc0/nvc0_miptree.c index 1beda7d4a2c8..3afde38e71ba 100644 --- a/src/gallium/drivers/nouveau/nvc0/nvc0_miptree.c +++ b/src/gallium/drivers/nouveau/nvc0/nvc0_miptree.c @@ -302,7 +302,7 @@ nvc0_miptree_create(struct pipe_screen *pscreen, if (!bo_config.nvc0.memtype && (pt->usage == PIPE_USAGE_STAGING || pt->bind & PIPE_BIND_SHARED)) mt->base.domain = NOUVEAU_BO_GART; else - mt->base.domain = NOUVEAU_BO_VRAM; + mt->base.domain = NV_VRAM_DOMAIN(nouveau_screen(pscreen)); bo_flags = mt->base.domain | NOUVEAU_BO_NOSNOOP; diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_program.c b/src/gallium/drivers/nouveau/nvc0/nvc0_program.c index 21be8b7240ea..7600dc4e7208 100644 --- a/src/gallium/drivers/nouveau/nvc0/nvc0_program.c +++ b/src/gallium/drivers/nouveau/nvc0/nvc0_program.c @@ -734,12 +734,12 @@ nvc0_program_upload_code(struct nvc0_context *nvc0, struct nvc0_program *prog) if (!is_cp) nvc0->base.push_data(&nvc0->base, screen->text, prog->code_base, - NOUVEAU_BO_VRAM, NVC0_SHADER_HEADER_SIZE, prog->hdr); + NV_VRAM_DOMAIN(&screen->base), NVC0_SHADER_HEADER_SIZE, prog->hdr); nvc0->base.push_data(&nvc0->base, screen->text, code_pos, - NOUVEAU_BO_VRAM, prog->code_size, prog->code); + NV_VRAM_DOMAIN(&screen->base), prog->code_size, prog->code); if (prog->immd_size) nvc0->base.push_data(&nvc0->base, - screen->text, prog->immd_base, NOUVEAU_BO_VRAM, + screen->text, prog->immd_base, NV_VRAM_DOMAIN(&screen->base), prog->immd_size, prog->immd_data); BEGIN_NVC0(nvc0->base.pushbuf, NVC0_3D(MEM_BARRIER), 1); @@ -770,7 +770,7 @@ nvc0_program_library_upload(struct nvc0_context *nvc0) return; nvc0->base.push_data(&nvc0->base, - screen->text, screen->lib_code->start, NOUVEAU_BO_VRAM, + screen->text, screen->lib_code->start, NV_VRAM_DOMAIN(&screen->base), size, code); /* no need for a memory barrier, will be emitted with first program */ } diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c index 88fc9264829f..ac5823e4a8d5 100644 --- a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c +++ b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c @@ -572,7 +572,7 @@ nvc0_screen_resize_tls_area(struct nvc0_screen *screen, size = align(size, 1 << 17); - ret = nouveau_bo_new(screen->base.device, NOUVEAU_BO_VRAM, 1 << 17, size, + ret = nouveau_bo_new(screen->base.device, NV_VRAM_DOMAIN(&screen->base), 1 << 17, size, NULL, &bo); if (ret) { NOUVEAU_ERR("failed to allocate TLS area, size: 0x%"PRIx64"\n", size); @@ -815,7 +815,7 @@ nvc0_screen_create(struct nouveau_device *dev) nvc0_magic_3d_init(push, screen->eng3d->oclass); - ret = nouveau_bo_new(dev, NOUVEAU_BO_VRAM, 1 << 17, 1 << 20, NULL, + ret = nouveau_bo_new(dev, NV_VRAM_DOMAIN(&screen->base), 1 << 17, 1 << 20, NULL, &screen->text); if (ret) goto fail; @@ -825,12 +825,12 @@ nvc0_screen_create(struct nouveau_device *dev) */ nouveau_heap_init(&screen->text_heap, 0, (1 << 20) - 0x100); - ret = nouveau_bo_new(dev, NOUVEAU_BO_VRAM, 1 << 12, 6 << 16, NULL, + ret = nouveau_bo_new(dev, NV_VRAM_DOMAIN(&screen->base), 1 << 12, 6 << 16, NULL, &screen->uniform_bo); if (ret) goto fail; - PUSH_REFN (push, screen->uniform_bo, NOUVEAU_BO_VRAM | NOUVEAU_BO_WR); + PUSH_REFN (push, screen->uniform_bo, NV_VRAM_DOMAIN(&screen->base) | NOUVEAU_BO_WR); for (i = 0; i < 5; ++i) { /* TIC and TSC entries for each unit (nve4+ only) */ @@ -901,7 +901,7 @@ nvc0_screen_create(struct nouveau_device *dev) PUSH_DATA (push, 0); if (screen->eng3d->oclass < GM107_3D_CLASS) { - ret = nouveau_bo_new(dev, NOUVEAU_BO_VRAM, 1 << 17, 1 << 20, NULL, + ret = nouveau_bo_new(dev, NV_VRAM_DOMAIN(&screen->base), 1 << 17, 1 << 20, NULL, &screen->poly_cache); if (ret) goto fail; @@ -912,7 +912,7 @@ nvc0_screen_create(struct nouveau_device *dev) PUSH_DATA (push, 3); } - ret = nouveau_bo_new(dev, NOUVEAU_BO_VRAM, 1 << 17, 1 << 17, NULL, + ret = nouveau_bo_new(dev, NV_VRAM_DOMAIN(&screen->base), 1 << 17, 1 << 17, NULL, &screen->txc); if (ret) goto fail; diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_shader_state.c b/src/gallium/drivers/nouveau/nvc0/nvc0_shader_state.c index 1000d8286d77..3fcca1671c8c 100644 --- a/src/gallium/drivers/nouveau/nvc0/nvc0_shader_state.c +++ b/src/gallium/drivers/nouveau/nvc0/nvc0_shader_state.c @@ -34,7 +34,7 @@ nvc0_program_update_context_state(struct nvc0_context *nvc0, struct nouveau_pushbuf *push = nvc0->base.pushbuf; if (prog && prog->need_tls) { - const uint32_t flags = NOUVEAU_BO_VRAM | NOUVEAU_BO_RDWR; + const uint32_t flags = NV_VRAM_DOMAIN(&nvc0->screen->base) | NOUVEAU_BO_RDWR; if (!nvc0->state.tls_required) BCTX_REFN_bo(nvc0->bufctx_3d, TLS, flags, nvc0->screen->tls); nvc0->state.tls_required |= 1 << stage; diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_state_validate.c b/src/gallium/drivers/nouveau/nvc0/nvc0_state_validate.c index 25a3232b48d9..696eacaa1e20 100644 --- a/src/gallium/drivers/nouveau/nvc0/nvc0_state_validate.c +++ b/src/gallium/drivers/nouveau/nvc0/nvc0_state_validate.c @@ -439,7 +439,7 @@ nvc0_constbufs_validate(struct nvc0_context *nvc0) BEGIN_NVC0(push, NVC0_3D(CB_BIND(s)), 1); PUSH_DATA (push, (0 << 4) | 1); } - nvc0_cb_push(&nvc0->base, bo, NOUVEAU_BO_VRAM, + nvc0_cb_push(&nvc0->base, bo, NV_VRAM_DOMAIN(&nvc0->screen->base), base, nvc0->state.uniform_buffer_bound[s], 0, (size + 3) / 4, nvc0->constbuf[s][0].u.data); diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_tex.c b/src/gallium/drivers/nouveau/nvc0/nvc0_tex.c index db6b6036a72d..dc4389d7432f 100644 --- a/src/gallium/drivers/nouveau/nvc0/nvc0_tex.c +++ b/src/gallium/drivers/nouveau/nvc0/nvc0_tex.c @@ -396,7 +396,7 @@ nvc0_validate_tsc(struct nvc0_context *nvc0, int s) tsc->id = nvc0_screen_tsc_alloc(nvc0->screen, tsc); nvc0_m2mf_push_linear(&nvc0->base, nvc0->screen->txc, - 65536 + tsc->id * 32, NOUVEAU_BO_VRAM, + 65536 + tsc->id * 32, NV_VRAM_DOMAIN(&nvc0->screen->base), 32, tsc->tsc); need_flush = TRUE; } diff --git a/src/gallium/drivers/nouveau/nvc0/nve4_compute.c b/src/gallium/drivers/nouveau/nvc0/nve4_compute.c index f243316b899c..fce02a7cc576 100644 --- a/src/gallium/drivers/nouveau/nvc0/nve4_compute.c +++ b/src/gallium/drivers/nouveau/nvc0/nve4_compute.c @@ -63,7 +63,7 @@ nve4_screen_compute_setup(struct nvc0_screen *screen, return ret; } - ret = nouveau_bo_new(dev, NOUVEAU_BO_VRAM, 0, NVE4_CP_PARAM_SIZE, NULL, + ret = nouveau_bo_new(dev, NV_VRAM_DOMAIN(&screen->base), 0, NVE4_CP_PARAM_SIZE, NULL, &screen->parm); if (ret) return ret; -- 2.1.2
Alexandre Courbot
2014-Oct-27 10:34 UTC
[Nouveau] [PATCH 3/3] gk20a: use NOUVEAU_BO_GART as VRAM domain
GK20A does not have dedicated VRAM, therefore allocating in VRAM can be sub-optimal and sometimes even harmful. Set its VRAM domain to NOUVEAU_BO_GART so all objects are allocated in system memory. Signed-off-by: Alexandre Courbot <acourbot at nvidia.com> --- src/gallium/drivers/nouveau/nvc0/nvc0_screen.c | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c index ac5823e4a8d5..ad143cd9a140 100644 --- a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c +++ b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c @@ -620,6 +620,16 @@ nvc0_screen_create(struct nouveau_device *dev) return NULL; pscreen = &screen->base.base; + /* Recognize chipsets with no VRAM */ + switch (dev->chipset) { + /* GK20A */ + case 0xea: + screen->base.vram_domain = NOUVEAU_BO_GART; + break; + default: + break; + } + ret = nouveau_screen_init(&screen->base, dev); if (ret) { nvc0_screen_destroy(pscreen); -- 2.1.2
Ilia Mirkin
2014-Oct-29 15:29 UTC
[Nouveau] [PATCH 3/3] gk20a: use NOUVEAU_BO_GART as VRAM domain
On Mon, Oct 27, 2014 at 6:34 AM, Alexandre Courbot <acourbot at nvidia.com> wrote:> GK20A does not have dedicated VRAM, therefore allocating in VRAM can be > sub-optimal and sometimes even harmful. Set its VRAM domain to > NOUVEAU_BO_GART so all objects are allocated in system memory. > > Signed-off-by: Alexandre Courbot <acourbot at nvidia.com> > --- > src/gallium/drivers/nouveau/nvc0/nvc0_screen.c | 10 ++++++++++ > 1 file changed, 10 insertions(+) > > diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c > index ac5823e4a8d5..ad143cd9a140 100644 > --- a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c > +++ b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c > @@ -620,6 +620,16 @@ nvc0_screen_create(struct nouveau_device *dev) > return NULL; > pscreen = &screen->base.base; > > + /* Recognize chipsets with no VRAM */ > + switch (dev->chipset) { > + /* GK20A */ > + case 0xea: > + screen->base.vram_domain = NOUVEAU_BO_GART;I think you also want to set vidmem_bindings = 0... although potentially after the |= that's done below. Although I guess that constbuf + command args buf need to be |='d into the sysmem_bindings for this to work out well. That said, we don't really handle explicit migration well right now, and those PIPE_BIND_* are *incredibly* misleading and don't actually necessarily reflect the current usage. [I have some patches to improve the situation, but you don't really have to worry about that.]> + break; > + default: > + break; > + } > + > ret = nouveau_screen_init(&screen->base, dev); > if (ret) { > nvc0_screen_destroy(pscreen); > -- > 2.1.2 >
Apparently Analagous Threads
- [PATCH v2 0/3] nouveau: support for custom VRAM domains
- [PATCH v3 0/2] nouveau: support for custom VRAM domains
- [PATCH v2 2/3] nvc0: use NV_VRAM_DOMAIN() macro
- [PATCH try 2 1/2] gallium/nouveau: decouple nouveau_fence implementation from screen
- [PATCH 1/2] gallium/nouveau: decouple nouveau_fence implementation from screen