NVIDIA has been exploring ways to better support the effort for an upstream kernel mode driver for GPUs that are capable of running GSP-RM firmware, since the introduction[1] to Nova. Use cases have been identified for which separating the core GPU programming out of the full DRM driver stack is a strong requirement from our key customers. An upstreamed NVIDIA GPU driver should be able to support current and emerging customer use cases for vGPU hosts. NVIDIA's vGPU deployments to date do not support compute or graphics functionality within the hypervisor host, and have no dependency on the Linux graphics subsystem, instead implementing the minimal functionality required to run vGPU guest VMs. For security-sensitive environments such as cloud infrastructure, it's important to continue support for running a minimal footprint vGPU host driver in a stripped-down / barebones kernel environment. This can be achieved by supporting both VFIO and DRM drivers as clients of a core driver, without requiring a full-fledged DRM driver (or the DRM subsystem itself) to be built into the host kernel. A core driver would be responsible for booting and communicating with GSP-RM, enumeration of HW configuration, shared/partitioned resource management, exception handling, and event dispatch. The DRM driver would do all the standard things a DRM driver does, and implement GPU memory management (TTM/HMM), KMS, command submission etc, as well as providing UAPI for userspace clients. These features would be implemented using HW resources allocated from a core driver, rather than the DRM driver being directly responsible for HW programming. As Nouveau's KMD is already split (in the logical sense) along similar lines, we're using it here for the purposes of this RFC to demonstrate the feasibility of such an architecture, and open it up for discussion. A link[2] to a tree containing the patches is below. [1] https://lore.kernel.org/all/3ed356488c9b0ca93845501425d427309f4cf616.camel at redhat.com/ [2] https://gitlab.freedesktop.org/bskeggs/nouveau/-/tree/00.03-module *** BLURB HERE *** Ben Skeggs (2): drm/nouveau/nvkm: export symbols needed by the drm driver drm/nouveau/nvkm: separate out into nvkm.ko drivers/gpu/drm/nouveau/Kbuild | 4 ++-- drivers/gpu/drm/nouveau/include/nvkm/core/module.h | 3 --- drivers/gpu/drm/nouveau/nouveau_drm.c | 10 +--------- drivers/gpu/drm/nouveau/nvkm/core/driver.c | 1 + drivers/gpu/drm/nouveau/nvkm/core/gpuobj.c | 2 ++ drivers/gpu/drm/nouveau/nvkm/core/mm.c | 4 ++++ drivers/gpu/drm/nouveau/nvkm/device/acpi.c | 1 + drivers/gpu/drm/nouveau/nvkm/engine/gr/base.c | 1 + drivers/gpu/drm/nouveau/nvkm/module.c | 8 ++++++-- drivers/gpu/drm/nouveau/nvkm/subdev/bios/init.c | 1 + drivers/gpu/drm/nouveau/nvkm/subdev/bios/pll.c | 1 + drivers/gpu/drm/nouveau/nvkm/subdev/fb/base.c | 3 +++ drivers/gpu/drm/nouveau/nvkm/subdev/gpio/base.c | 3 +++ drivers/gpu/drm/nouveau/nvkm/subdev/i2c/base.c | 2 ++ drivers/gpu/drm/nouveau/nvkm/subdev/i2c/bus.c | 1 + drivers/gpu/drm/nouveau/nvkm/subdev/iccsense/base.c | 1 + drivers/gpu/drm/nouveau/nvkm/subdev/therm/base.c | 1 + drivers/gpu/drm/nouveau/nvkm/subdev/therm/fan.c | 1 + drivers/gpu/drm/nouveau/nvkm/subdev/volt/base.c | 1 + 19 files changed, 33 insertions(+), 16 deletions(-) -- 2.44.0
Ben Skeggs
2024-Jun-13 17:02 UTC
[PATCH 1/2] drm/nouveau/nvkm: export symbols needed by the drm driver
The primary interfaces to NVKM are through NVIF, but there are a small number of functions which are called directly. Signed-off-by: Ben Skeggs <bskeggs at nvidia.com> --- drivers/gpu/drm/nouveau/nvkm/core/driver.c | 1 + drivers/gpu/drm/nouveau/nvkm/core/gpuobj.c | 2 ++ drivers/gpu/drm/nouveau/nvkm/core/mm.c | 4 ++++ drivers/gpu/drm/nouveau/nvkm/device/acpi.c | 1 + drivers/gpu/drm/nouveau/nvkm/engine/gr/base.c | 1 + drivers/gpu/drm/nouveau/nvkm/subdev/bios/init.c | 1 + drivers/gpu/drm/nouveau/nvkm/subdev/bios/pll.c | 1 + drivers/gpu/drm/nouveau/nvkm/subdev/fb/base.c | 3 +++ drivers/gpu/drm/nouveau/nvkm/subdev/gpio/base.c | 3 +++ drivers/gpu/drm/nouveau/nvkm/subdev/i2c/base.c | 2 ++ drivers/gpu/drm/nouveau/nvkm/subdev/i2c/bus.c | 1 + drivers/gpu/drm/nouveau/nvkm/subdev/iccsense/base.c | 1 + drivers/gpu/drm/nouveau/nvkm/subdev/therm/base.c | 1 + drivers/gpu/drm/nouveau/nvkm/subdev/therm/fan.c | 1 + drivers/gpu/drm/nouveau/nvkm/subdev/volt/base.c | 1 + 15 files changed, 24 insertions(+) diff --git a/drivers/gpu/drm/nouveau/nvkm/core/driver.c b/drivers/gpu/drm/nouveau/nvkm/core/driver.c index dcc5dc7f246e..d6e8117a2a74 100644 --- a/drivers/gpu/drm/nouveau/nvkm/core/driver.c +++ b/drivers/gpu/drm/nouveau/nvkm/core/driver.c @@ -78,3 +78,4 @@ nvkm_driver_ctor(struct nvkm_device *device, const struct nvif_driver **pdrv, *pdrv = &nvkm_driver; return 0; } +EXPORT_SYMBOL(nvkm_driver_ctor); diff --git a/drivers/gpu/drm/nouveau/nvkm/core/gpuobj.c b/drivers/gpu/drm/nouveau/nvkm/core/gpuobj.c index d6de2b3ed2c3..a06ced74fb10 100644 --- a/drivers/gpu/drm/nouveau/nvkm/core/gpuobj.c +++ b/drivers/gpu/drm/nouveau/nvkm/core/gpuobj.c @@ -224,6 +224,7 @@ nvkm_gpuobj_del(struct nvkm_gpuobj **pgpuobj) *pgpuobj = NULL; } } +EXPORT_SYMBOL(nvkm_gpuobj_del); int nvkm_gpuobj_new(struct nvkm_device *device, u32 size, int align, bool zero, @@ -240,6 +241,7 @@ nvkm_gpuobj_new(struct nvkm_device *device, u32 size, int align, bool zero, nvkm_gpuobj_del(pgpuobj); return ret; } +EXPORT_SYMBOL(nvkm_gpuobj_new); /* the below is basically only here to support sharing the paged dma object * for PCI(E)GART on <=nv4x chipsets, and should *not* be expected to work diff --git a/drivers/gpu/drm/nouveau/nvkm/core/mm.c b/drivers/gpu/drm/nouveau/nvkm/core/mm.c index f78a06a6b2f1..c2a66cfe2a1e 100644 --- a/drivers/gpu/drm/nouveau/nvkm/core/mm.c +++ b/drivers/gpu/drm/nouveau/nvkm/core/mm.c @@ -81,6 +81,7 @@ nvkm_mm_free(struct nvkm_mm *mm, struct nvkm_mm_node **pthis) *pthis = NULL; } +EXPORT_SYMBOL(nvkm_mm_free); static struct nvkm_mm_node * region_head(struct nvkm_mm *mm, struct nvkm_mm_node *a, u32 size) @@ -156,6 +157,7 @@ nvkm_mm_head(struct nvkm_mm *mm, u8 heap, u8 type, u32 size_max, u32 size_min, return -ENOSPC; } +EXPORT_SYMBOL(nvkm_mm_head); static struct nvkm_mm_node * region_tail(struct nvkm_mm *mm, struct nvkm_mm_node *a, u32 size) @@ -278,6 +280,7 @@ nvkm_mm_init(struct nvkm_mm *mm, u8 heap, u32 offset, u32 length, u32 block) mm->heap_nodes++; return 0; } +EXPORT_SYMBOL(nvkm_mm_init); int nvkm_mm_fini(struct nvkm_mm *mm) @@ -305,3 +308,4 @@ nvkm_mm_fini(struct nvkm_mm *mm) mm->heap_nodes = 0; return 0; } +EXPORT_SYMBOL(nvkm_mm_fini); diff --git a/drivers/gpu/drm/nouveau/nvkm/device/acpi.c b/drivers/gpu/drm/nouveau/nvkm/device/acpi.c index ff8a3027c1bc..941e7d2a29a8 100644 --- a/drivers/gpu/drm/nouveau/nvkm/device/acpi.c +++ b/drivers/gpu/drm/nouveau/nvkm/device/acpi.c @@ -109,6 +109,7 @@ void nvkm_acpi_switcheroo_set_powerdown(void) NOUVEAU_DSM_OPTIMUS_SET_POWERDOWN, &result); } +EXPORT_SYMBOL(nvkm_acpi_switcheroo_set_powerdown); /* * On some platforms, _DSM(nvkm_op_dsm_muid, func0) has special diff --git a/drivers/gpu/drm/nouveau/nvkm/engine/gr/base.c b/drivers/gpu/drm/nouveau/nvkm/engine/gr/base.c index f5e68f09df76..151c10558c82 100644 --- a/drivers/gpu/drm/nouveau/nvkm/engine/gr/base.c +++ b/drivers/gpu/drm/nouveau/nvkm/engine/gr/base.c @@ -76,6 +76,7 @@ nvkm_gr_units(struct nvkm_gr *gr) return gr->func->units(gr); return 0; } +EXPORT_SYMBOL(nvkm_gr_units); int nvkm_gr_tlb_flush(struct nvkm_gr *gr) diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/bios/init.c b/drivers/gpu/drm/nouveau/nvkm/subdev/bios/init.c index b54f044c4483..3ac3dbc0c03a 100644 --- a/drivers/gpu/drm/nouveau/nvkm/subdev/bios/init.c +++ b/drivers/gpu/drm/nouveau/nvkm/subdev/bios/init.c @@ -2317,6 +2317,7 @@ nvbios_exec(struct nvbios_init *init) init->nested--; return 0; } +EXPORT_SYMBOL(nvbios_exec); int nvbios_post(struct nvkm_subdev *subdev, bool execute) diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/bios/pll.c b/drivers/gpu/drm/nouveau/nvkm/subdev/bios/pll.c index 2ec84b8a3b3a..1cd5b1996489 100644 --- a/drivers/gpu/drm/nouveau/nvkm/subdev/bios/pll.c +++ b/drivers/gpu/drm/nouveau/nvkm/subdev/bios/pll.c @@ -438,3 +438,4 @@ nvbios_pll_parse(struct nvkm_bios *bios, u32 type, struct nvbios_pll *info) return 0; } +EXPORT_SYMBOL(nvbios_pll_parse); diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/fb/base.c b/drivers/gpu/drm/nouveau/nvkm/subdev/fb/base.c index 8a286a9349ac..b1fab6332ed1 100644 --- a/drivers/gpu/drm/nouveau/nvkm/subdev/fb/base.c +++ b/drivers/gpu/drm/nouveau/nvkm/subdev/fb/base.c @@ -36,6 +36,7 @@ nvkm_fb_tile_fini(struct nvkm_fb *fb, int region, struct nvkm_fb_tile *tile) { fb->func->tile.fini(fb, region, tile); } +EXPORT_SYMBOL(nvkm_fb_tile_fini); void nvkm_fb_tile_init(struct nvkm_fb *fb, int region, u32 addr, u32 size, @@ -43,6 +44,7 @@ nvkm_fb_tile_init(struct nvkm_fb *fb, int region, u32 addr, u32 size, { fb->func->tile.init(fb, region, addr, size, pitch, flags, tile); } +EXPORT_SYMBOL(nvkm_fb_tile_init); void nvkm_fb_tile_prog(struct nvkm_fb *fb, int region, struct nvkm_fb_tile *tile) @@ -56,6 +58,7 @@ nvkm_fb_tile_prog(struct nvkm_fb *fb, int region, struct nvkm_fb_tile *tile) nvkm_engine_tile(device->mpeg, region); } } +EXPORT_SYMBOL(nvkm_fb_tile_prog); static void nvkm_fb_sysmem_flush_page_init(struct nvkm_device *device) diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/gpio/base.c b/drivers/gpu/drm/nouveau/nvkm/subdev/gpio/base.c index b196baa376dc..f93ce38afd16 100644 --- a/drivers/gpu/drm/nouveau/nvkm/subdev/gpio/base.c +++ b/drivers/gpu/drm/nouveau/nvkm/subdev/gpio/base.c @@ -75,6 +75,7 @@ nvkm_gpio_find(struct nvkm_gpio *gpio, int idx, u8 tag, u8 line, return -ENOENT; } +EXPORT_SYMBOL(nvkm_gpio_find); int nvkm_gpio_set(struct nvkm_gpio *gpio, int idx, u8 tag, u8 line, int state) @@ -91,6 +92,7 @@ nvkm_gpio_set(struct nvkm_gpio *gpio, int idx, u8 tag, u8 line, int state) return ret; } +EXPORT_SYMBOL(nvkm_gpio_set); int nvkm_gpio_get(struct nvkm_gpio *gpio, int idx, u8 tag, u8 line) @@ -107,6 +109,7 @@ nvkm_gpio_get(struct nvkm_gpio *gpio, int idx, u8 tag, u8 line) return ret; } +EXPORT_SYMBOL(nvkm_gpio_get); static void nvkm_gpio_intr_fini(struct nvkm_event *event, int type, int index) diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/i2c/base.c b/drivers/gpu/drm/nouveau/nvkm/subdev/i2c/base.c index 731b2f68d3db..845e7f41076e 100644 --- a/drivers/gpu/drm/nouveau/nvkm/subdev/i2c/base.c +++ b/drivers/gpu/drm/nouveau/nvkm/subdev/i2c/base.c @@ -71,6 +71,7 @@ nvkm_i2c_bus_find(struct nvkm_i2c *i2c, int id) return NULL; } +EXPORT_SYMBOL(nvkm_i2c_bus_find); struct nvkm_i2c_aux * nvkm_i2c_aux_find(struct nvkm_i2c *i2c, int id) @@ -84,6 +85,7 @@ nvkm_i2c_aux_find(struct nvkm_i2c *i2c, int id) return NULL; } +EXPORT_SYMBOL(nvkm_i2c_aux_find); static void nvkm_i2c_intr_fini(struct nvkm_event *event, int type, int id) diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/i2c/bus.c b/drivers/gpu/drm/nouveau/nvkm/subdev/i2c/bus.c index ed50cc3736b9..47fade442d14 100644 --- a/drivers/gpu/drm/nouveau/nvkm/subdev/i2c/bus.c +++ b/drivers/gpu/drm/nouveau/nvkm/subdev/i2c/bus.c @@ -189,6 +189,7 @@ nvkm_i2c_bus_probe(struct nvkm_i2c_bus *bus, const char *what, BUS_DBG(bus, "no devices found."); return -ENODEV; } +EXPORT_SYMBOL(nvkm_i2c_bus_probe); void nvkm_i2c_bus_del(struct nvkm_i2c_bus **pbus) diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/iccsense/base.c b/drivers/gpu/drm/nouveau/nvkm/subdev/iccsense/base.c index 8f0ccd3664eb..6e6d7bc0ea1f 100644 --- a/drivers/gpu/drm/nouveau/nvkm/subdev/iccsense/base.c +++ b/drivers/gpu/drm/nouveau/nvkm/subdev/iccsense/base.c @@ -126,6 +126,7 @@ nvkm_iccsense_read_all(struct nvkm_iccsense *iccsense) } return result; } +EXPORT_SYMBOL(nvkm_iccsense_read_all); static void * nvkm_iccsense_dtor(struct nvkm_subdev *subdev) diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/therm/base.c b/drivers/gpu/drm/nouveau/nvkm/subdev/therm/base.c index fc5ee118e910..2a32559b38f4 100644 --- a/drivers/gpu/drm/nouveau/nvkm/subdev/therm/base.c +++ b/drivers/gpu/drm/nouveau/nvkm/subdev/therm/base.c @@ -33,6 +33,7 @@ nvkm_therm_temp_get(struct nvkm_therm *therm) return therm->func->temp_get(therm); return -ENODEV; } +EXPORT_SYMBOL(nvkm_therm_temp_get); static int nvkm_therm_update_trip(struct nvkm_therm *therm) diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/therm/fan.c b/drivers/gpu/drm/nouveau/nvkm/subdev/therm/fan.c index f8fa43c8a7d2..418f441897f1 100644 --- a/drivers/gpu/drm/nouveau/nvkm/subdev/therm/fan.c +++ b/drivers/gpu/drm/nouveau/nvkm/subdev/therm/fan.c @@ -164,6 +164,7 @@ nvkm_therm_fan_sense(struct nvkm_therm *therm) } else return 0; } +EXPORT_SYMBOL(nvkm_therm_fan_sense); int nvkm_therm_fan_user_get(struct nvkm_therm *therm) diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/volt/base.c b/drivers/gpu/drm/nouveau/nvkm/subdev/volt/base.c index a17a6dd8d3de..07d861440232 100644 --- a/drivers/gpu/drm/nouveau/nvkm/subdev/volt/base.c +++ b/drivers/gpu/drm/nouveau/nvkm/subdev/volt/base.c @@ -46,6 +46,7 @@ nvkm_volt_get(struct nvkm_volt *volt) } return ret; } +EXPORT_SYMBOL(nvkm_volt_get); static int nvkm_volt_set(struct nvkm_volt *volt, u32 uv) -- 2.44.0
Signed-off-by: Ben Skeggs <bskeggs at nvidia.com> --- drivers/gpu/drm/nouveau/Kbuild | 4 ++-- drivers/gpu/drm/nouveau/include/nvkm/core/module.h | 3 --- drivers/gpu/drm/nouveau/nouveau_drm.c | 10 +--------- drivers/gpu/drm/nouveau/nvkm/module.c | 8 ++++++-- 4 files changed, 9 insertions(+), 16 deletions(-) diff --git a/drivers/gpu/drm/nouveau/Kbuild b/drivers/gpu/drm/nouveau/Kbuild index cc471ab6a7ec..b62c6858fb7b 100644 --- a/drivers/gpu/drm/nouveau/Kbuild +++ b/drivers/gpu/drm/nouveau/Kbuild @@ -8,11 +8,11 @@ ccflags-y += -I $(NOUVEAU_PATH)/$(src) # NVKM - HW resource manager include $(src)/nvkm/Kbuild -nouveau-y := $(nvkm-y) +obj-$(CONFIG_DRM_NOUVEAU) += nvkm.o # NVIF - NVKM interface library (NVKM user interface also defined here) include $(src)/nvif/Kbuild -nouveau-y += $(nvif-y) +nouveau-y := $(nvif-y) # DRM - general ifdef CONFIG_X86 diff --git a/drivers/gpu/drm/nouveau/include/nvkm/core/module.h b/drivers/gpu/drm/nouveau/include/nvkm/core/module.h index fc42ace93a1c..d1ad6aae9911 100644 --- a/drivers/gpu/drm/nouveau/include/nvkm/core/module.h +++ b/drivers/gpu/drm/nouveau/include/nvkm/core/module.h @@ -3,8 +3,5 @@ #define __NVKM_MODULE_H__ #include <linux/module.h> -int __init nvkm_init(void); -void __exit nvkm_exit(void); - extern int nvkm_runpm; #endif diff --git a/drivers/gpu/drm/nouveau/nouveau_drm.c b/drivers/gpu/drm/nouveau/nouveau_drm.c index 7e77e950eba2..4f55cd73d1b3 100644 --- a/drivers/gpu/drm/nouveau/nouveau_drm.c +++ b/drivers/gpu/drm/nouveau/nouveau_drm.c @@ -1174,7 +1174,7 @@ static const struct dev_pm_ops nouveau_pm_ops = { static const struct auxiliary_device_id nouveau_drm_id_table[] = { - { .name = "nouveau.device" }, + { .name = "nvkm.device" }, {} }; @@ -1190,8 +1190,6 @@ nouveau_auxdrv = { static int __init nouveau_drm_init(void) { - int ret; - nouveau_display_options(); if (nouveau_modeset == -1) { @@ -1202,10 +1200,6 @@ nouveau_drm_init(void) if (!nouveau_modeset) return 0; - ret = nvkm_init(); - if (ret) - return ret; - nouveau_backlight_ctor(); return auxiliary_driver_register(&nouveau_auxdrv); @@ -1223,8 +1217,6 @@ nouveau_drm_exit(void) if (IS_ENABLED(CONFIG_DRM_NOUVEAU_SVM)) mmu_notifier_synchronize(); - - nvkm_exit(); } module_init(nouveau_drm_init); diff --git a/drivers/gpu/drm/nouveau/nvkm/module.c b/drivers/gpu/drm/nouveau/nvkm/module.c index c14dd7fa15c2..d0ae023cdc74 100644 --- a/drivers/gpu/drm/nouveau/nvkm/module.c +++ b/drivers/gpu/drm/nouveau/nvkm/module.c @@ -26,7 +26,7 @@ int nvkm_runpm = -1; -void __exit +static void __exit nvkm_exit(void) { #ifdef CONFIG_PCI @@ -39,7 +39,7 @@ nvkm_exit(void) #endif } -int __init +static int __init nvkm_init(void) { int ret; @@ -60,3 +60,7 @@ nvkm_init(void) return 0; } + +MODULE_LICENSE("GPL and additional rights"); +module_init(nvkm_init); +module_exit(nvkm_exit); -- 2.44.0
Daniel Vetter
2024-Jun-17 13:55 UTC
[RFC] GPU driver with separate "core" and "DRM" modules
On Fri, Jun 14, 2024 at 03:02:09AM +1000, Ben Skeggs wrote:> NVIDIA has been exploring ways to better support the effort for an > upstream kernel mode driver for GPUs that are capable of running GSP-RM > firmware, since the introduction[1] to Nova. > > Use cases have been identified for which separating the core GPU > programming out of the full DRM driver stack is a strong requirement > from our key customers. > > An upstreamed NVIDIA GPU driver should be able to support current and > emerging customer use cases for vGPU hosts. NVIDIA's vGPU deployments > to date do not support compute or graphics functionality within the > hypervisor host, and have no dependency on the Linux graphics subsystem, > instead implementing the minimal functionality required to run vGPU > guest VMs. > > For security-sensitive environments such as cloud infrastructure, it's > important to continue support for running a minimal footprint vGPU host > driver in a stripped-down / barebones kernel environment. > > This can be achieved by supporting both VFIO and DRM drivers as clients > of a core driver, without requiring a full-fledged DRM driver (or the > DRM subsystem itself) to be built into the host kernel. > > A core driver would be responsible for booting and communicating with > GSP-RM, enumeration of HW configuration, shared/partitioned resource > management, exception handling, and event dispatch. > > The DRM driver would do all the standard things a DRM driver does, and > implement GPU memory management (TTM/HMM), KMS, command submission etc, > as well as providing UAPI for userspace clients. These features would > be implemented using HW resources allocated from a core driver, rather > than the DRM driver being directly responsible for HW programming. > > As Nouveau's KMD is already split (in the logical sense) along similar > lines, we're using it here for the purposes of this RFC to demonstrate > the feasibility of such an architecture, and open it up for discussion.Sounds reasonable. Only bikeshed I have to add is that the blessed way (according to the cool kernel maintainers at least or something) to structure this is using auxbus. Definitely when you end up with more than one driver binding to the core (like maybe some system management interface thing, or perhaps a special compute-only kernel driver). https://dri.freedesktop.org/docs/drm/driver-api/auxiliary_bus.html Cheers, Sima> > A link[2] to a tree containing the patches is below. > > [1] https://lore.kernel.org/all/3ed356488c9b0ca93845501425d427309f4cf616.camel at redhat.com/ > [2] https://gitlab.freedesktop.org/bskeggs/nouveau/-/tree/00.03-module > > *** BLURB HERE *** > > Ben Skeggs (2): > drm/nouveau/nvkm: export symbols needed by the drm driver > drm/nouveau/nvkm: separate out into nvkm.ko > > drivers/gpu/drm/nouveau/Kbuild | 4 ++-- > drivers/gpu/drm/nouveau/include/nvkm/core/module.h | 3 --- > drivers/gpu/drm/nouveau/nouveau_drm.c | 10 +--------- > drivers/gpu/drm/nouveau/nvkm/core/driver.c | 1 + > drivers/gpu/drm/nouveau/nvkm/core/gpuobj.c | 2 ++ > drivers/gpu/drm/nouveau/nvkm/core/mm.c | 4 ++++ > drivers/gpu/drm/nouveau/nvkm/device/acpi.c | 1 + > drivers/gpu/drm/nouveau/nvkm/engine/gr/base.c | 1 + > drivers/gpu/drm/nouveau/nvkm/module.c | 8 ++++++-- > drivers/gpu/drm/nouveau/nvkm/subdev/bios/init.c | 1 + > drivers/gpu/drm/nouveau/nvkm/subdev/bios/pll.c | 1 + > drivers/gpu/drm/nouveau/nvkm/subdev/fb/base.c | 3 +++ > drivers/gpu/drm/nouveau/nvkm/subdev/gpio/base.c | 3 +++ > drivers/gpu/drm/nouveau/nvkm/subdev/i2c/base.c | 2 ++ > drivers/gpu/drm/nouveau/nvkm/subdev/i2c/bus.c | 1 + > drivers/gpu/drm/nouveau/nvkm/subdev/iccsense/base.c | 1 + > drivers/gpu/drm/nouveau/nvkm/subdev/therm/base.c | 1 + > drivers/gpu/drm/nouveau/nvkm/subdev/therm/fan.c | 1 + > drivers/gpu/drm/nouveau/nvkm/subdev/volt/base.c | 1 + > 19 files changed, 33 insertions(+), 16 deletions(-) > > -- > 2.44.0 >-- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch
Danilo Krummrich
2024-Jun-17 17:30 UTC
[RFC] GPU driver with separate "core" and "DRM" modules
On Fri, Jun 14, 2024 at 03:02:09AM +1000, Ben Skeggs wrote:> NVIDIA has been exploring ways to better support the effort for an > upstream kernel mode driver for GPUs that are capable of running GSP-RM > firmware, since the introduction[1] to Nova. > > Use cases have been identified for which separating the core GPU > programming out of the full DRM driver stack is a strong requirement > from our key customers. > > An upstreamed NVIDIA GPU driver should be able to support current and > emerging customer use cases for vGPU hosts. NVIDIA's vGPU deployments > to date do not support compute or graphics functionality within the > hypervisor host, and have no dependency on the Linux graphics subsystem, > instead implementing the minimal functionality required to run vGPU > guest VMs. > > For security-sensitive environments such as cloud infrastructure, it's > important to continue support for running a minimal footprint vGPU host > driver in a stripped-down / barebones kernel environment. > > This can be achieved by supporting both VFIO and DRM drivers as clients > of a core driver, without requiring a full-fledged DRM driver (or the > DRM subsystem itself) to be built into the host kernel. > > A core driver would be responsible for booting and communicating with > GSP-RM, enumeration of HW configuration, shared/partitioned resource > management, exception handling, and event dispatch. > > The DRM driver would do all the standard things a DRM driver does, and > implement GPU memory management (TTM/HMM), KMS, command submission etc, > as well as providing UAPI for userspace clients. These features would > be implemented using HW resources allocated from a core driver, rather > than the DRM driver being directly responsible for HW programming. > > As Nouveau's KMD is already split (in the logical sense) along similar > lines, we're using it here for the purposes of this RFC to demonstrate > the feasibility of such an architecture, and open it up for discussion.Generally, I think that approach is reasonable and I like it. There's only a few concerns I have for now. We've already had (and still have) quite a few difficulties due to this split in Nouveau. Especially when it comes to VMM and handling page tables. There are cases where the locking architecture must be closely aligned with the upper layers, i.e. the (VM_BIND) uAPI. Having a separate (local) locking architecture doesn't work out well in this case due to the implications of dealing with dma_fences and their signalling paths. Unfortunately, we can't even argue that we solved this problem in Nouveau. I think it's fair to say that we found ways (without rewriting / restructuring a lot of the VMM code to use a more global locking architecture) to make it work in practice, but surely there are still conditions that (at least theoretically) can lock things up. I'm not saying that it's impossible to work this out, but having a strong separation is likely to make those things quite a bit more difficult. On the other hand this is a problem we might have to deal with either way, it shouldn't matter too much having separate modules for VFIO and the GPU core. Besides that, do we expect semantical changes in the firmware that can potentially propagate up in the following sense? [GSP firmware -> Host GPU core driver -> VFIO driver -> Guest GPU core driver] If so, how do we deal with those? In the context of ensuring compatibility, can we ensure this can't lead to increasing maintainance and testing effort over time? - Danilo> > A link[2] to a tree containing the patches is below. > > [1] https://lore.kernel.org/all/3ed356488c9b0ca93845501425d427309f4cf616.camel at redhat.com/ > [2] https://gitlab.freedesktop.org/bskeggs/nouveau/-/tree/00.03-module > > *** BLURB HERE *** > > Ben Skeggs (2): > drm/nouveau/nvkm: export symbols needed by the drm driver > drm/nouveau/nvkm: separate out into nvkm.ko > > drivers/gpu/drm/nouveau/Kbuild | 4 ++-- > drivers/gpu/drm/nouveau/include/nvkm/core/module.h | 3 --- > drivers/gpu/drm/nouveau/nouveau_drm.c | 10 +--------- > drivers/gpu/drm/nouveau/nvkm/core/driver.c | 1 + > drivers/gpu/drm/nouveau/nvkm/core/gpuobj.c | 2 ++ > drivers/gpu/drm/nouveau/nvkm/core/mm.c | 4 ++++ > drivers/gpu/drm/nouveau/nvkm/device/acpi.c | 1 + > drivers/gpu/drm/nouveau/nvkm/engine/gr/base.c | 1 + > drivers/gpu/drm/nouveau/nvkm/module.c | 8 ++++++-- > drivers/gpu/drm/nouveau/nvkm/subdev/bios/init.c | 1 + > drivers/gpu/drm/nouveau/nvkm/subdev/bios/pll.c | 1 + > drivers/gpu/drm/nouveau/nvkm/subdev/fb/base.c | 3 +++ > drivers/gpu/drm/nouveau/nvkm/subdev/gpio/base.c | 3 +++ > drivers/gpu/drm/nouveau/nvkm/subdev/i2c/base.c | 2 ++ > drivers/gpu/drm/nouveau/nvkm/subdev/i2c/bus.c | 1 + > drivers/gpu/drm/nouveau/nvkm/subdev/iccsense/base.c | 1 + > drivers/gpu/drm/nouveau/nvkm/subdev/therm/base.c | 1 + > drivers/gpu/drm/nouveau/nvkm/subdev/therm/fan.c | 1 + > drivers/gpu/drm/nouveau/nvkm/subdev/volt/base.c | 1 + > 19 files changed, 33 insertions(+), 16 deletions(-) > > -- > 2.44.0 >