On Tue, 25 Nov 2025 at 19:15, Christian K?nig <christian.koenig at
amd.com> wrote:>
> On 11/25/25 10:08, Dave Airlie wrote:
> > On Tue, 25 Nov 2025 at 18:11, Christian K?nig <christian.koenig at
amd.com> wrote:
> >>
> >> On 11/25/25 08:59, John Hubbard wrote:
> >>> On 11/24/25 11:54 PM, Christian K?nig wrote:
> >>>> On 11/25/25 08:49, Dave Airlie wrote:
> >>>>> On Tue, 25 Nov 2025 at 17:45, Christian K?nig
<christian.koenig at amd.com> wrote:
> >>> ...
> >>>> My question is why exactly is nova separated into
nova-core and nova-drm? That doesn't seem to be necessary in the first
place.
> >>>>
> >>> The idea is that nova-core allows building up a separate
software stack for
> >>> VFIO, without pulling in any DRM-specific code that a
hypervisor (for example)
> >>> wouldn't need. That makes for a smaller, more
security-auditable set of code
> >>> for that case.
> >>
> >> Well that is the same argument used by some AMD team to maintain a
separate out of tree hypervisor for nearly a decade.
> >>
> >> Additional to that the same argument has also been used to justify
the KFD node as alternative API to DRM for compute.
> >>
> >> Both cases have proven to be extremely bad ideas.
> >>
> >> Background is that except for all the legacy stuff the DRM API is
actually very well thought through and it is actually quite hard to come up with
something similarly well.
> >>
> >
> > Well you just answered your own question, why is AMD maintaining GIM
> > instead of solving this upstream with a split model? the nova-core/drm
> > split would be perfect for GIM.
>
> No, it won't.
>
> We have the requirement to work with GEM objects and DMA-buf file
descriptors in the hypervisor as well.
>
> And my suspicion is that you end up with the same requirements in nova as
well in which case you end up interchanging handles with DRM as well.
>
> We have seen the same for KFD and it turned out to be an absolutely
horrible interaction.
>
> > kfd was a terrible idea, and we don't intend to offer userspace
> > multiple APIs with nova, nova-drm will be the primary userspace API
> > provider. nova-core will not provide userspace API, it will provide an
> > API to nova-drm and an API to the vgpu driver which will provide
it's
> > own userspace API without graphics or compute, just enough to setup
> > VFs.
>
> Ok, then why do you need nova-core in the first place? E.g. where should be
the vgpu driver and what interface does it provide?
The ask is for a driver for cloud providers to run on their
hypervisors that does just enough to manage the VFs through VFIO
without having a complete drm driver or any drm infrastructure loaded.
The nice pictures are here
https://lore.kernel.org/all/20250903221111.3866249-1-zhiw at nvidia.com/
You will only be loading one of nova-drm or the vfio driver at least
in supported systems, depending on the GPU configuration, whether we
allow users to do things like that isn't well decided.
So far I haven't heard anything about needing dma-buf interactions at
that level, and maybe Zhi has more insight into the future there.
Dave.