Jason Gunthorpe
2025-Oct-01 14:46 UTC
[PATCH 0/2] rust: pci: expose is_virtfn() and reject VFs in nova-core
On Tue, Sep 30, 2025 at 06:26:23PM -0700, John Hubbard wrote:> On 9/30/25 5:26 PM, Alexandre Courbot wrote: > > On Wed Oct 1, 2025 at 7:07 AM JST, John Hubbard wrote: > >> Post-Kangrejos, the approach for NovaCore + VFIO has changed a bit: the > >> idea now is that VFIO drivers, for NVIDIA GPUs that are supported by > >> NovaCore, should bind directly to the GPU's VFs. (An earlier idea was to > >> let NovaCore bind to the VFs, and then have NovaCore call into the upper > >> (VFIO) module via Aux Bus, but this turns out to be awkward and is no > >> longer in favor.) So, in order to support that: > >> > >> Nova-core must only bind to Physical Functions (PFs) and regular PCI > >> devices, not to Virtual Functions (VFs) created through SR-IOV. > > > > Naive question: will guests also see the passed-through VF as a VF? If > > so, wouldn't this change also prevents guests from using Nova? > > I'm also new to this area. I would expect that guests *must* see > these as PFs, otherwise...nothing makes any sense. > > Maybe Alex Williamson or Jason Gunthorpe (+CC) can chime in.Driver should never do something like this. Novacore should work on a VF pretending to be a PF in a VM, and it should work directly on that same VF outside a VM. It is not the job of driver to make binding decisions like 'oh VFs of this devices are usually VFIO so I will fail probe'. VFIO users should use the disable driver autobinding sysfs before creating SRIOV instance to prevent this auto binding and then bind VFIO manually. Or userspace can manually unbind novacore from the VF and rebind VFIO. Jason
Alex Williamson
2025-Oct-01 18:16 UTC
[PATCH 0/2] rust: pci: expose is_virtfn() and reject VFs in nova-core
On Wed, 1 Oct 2025 11:46:29 -0300 Jason Gunthorpe <jgg at nvidia.com> wrote:> On Tue, Sep 30, 2025 at 06:26:23PM -0700, John Hubbard wrote: > > On 9/30/25 5:26 PM, Alexandre Courbot wrote: > > > On Wed Oct 1, 2025 at 7:07 AM JST, John Hubbard wrote: > > >> Post-Kangrejos, the approach for NovaCore + VFIO has changed a bit: the > > >> idea now is that VFIO drivers, for NVIDIA GPUs that are supported by > > >> NovaCore, should bind directly to the GPU's VFs. (An earlier idea was to > > >> let NovaCore bind to the VFs, and then have NovaCore call into the upper > > >> (VFIO) module via Aux Bus, but this turns out to be awkward and is no > > >> longer in favor.) So, in order to support that: > > >> > > >> Nova-core must only bind to Physical Functions (PFs) and regular PCI > > >> devices, not to Virtual Functions (VFs) created through SR-IOV. > > > > > > Naive question: will guests also see the passed-through VF as a VF? If > > > so, wouldn't this change also prevents guests from using Nova? > > > > I'm also new to this area. I would expect that guests *must* see > > these as PFs, otherwise...nothing makes any sense.To answer this specific question, a VF essentially appears as a PF to the VM. The relationship between a PF and VF is established when SR-IOV is configured and in part requires understanding the offset and stride of the VF enumeration, none of which is visible to the VM. The gaps in VF devices (ex. device ID register) are also emulated in the hypervisor stack.> > Maybe Alex Williamson or Jason Gunthorpe (+CC) can chime in. > > Driver should never do something like this. > > Novacore should work on a VF pretending to be a PF in a VM, and it > should work directly on that same VF outside a VM. > > It is not the job of driver to make binding decisions like 'oh VFs of > this devices are usually VFIO so I will fail probe'. > > VFIO users should use the disable driver autobinding sysfs before > creating SRIOV instance to prevent this auto binding and then bind > VFIO manually. > > Or userspace can manually unbind novacore from the VF and rebind VFIO.But this is also true, unbinding "native" host drivers is a fact of life for vfio and we do have the sriov_drivers_autoprobe sysfs attributes if a user wants to set a policy for automatically probing VF drivers for a PF. I think the question would be whether a "bare" VF really provides a useful device for nova-core to bind to or if we're just picking it up because the ID table matches. It's my impression that we require a fair bit of software emulation/virtualization in the host vGPU driver to turn the VF into something that can work like a PF in the VM and I don't know that we can require nova-core to make use of a VF without that emulation/virtualization layer. For example, aren't VRAM allocations for a VF done as part of profiling the VF through the vGPU host driver? Thanks, Alex