Danilo Krummrich
2025-Oct-02 13:03 UTC
[PATCH v2 1/2] rust: pci: skip probing VFs if driver doesn't support VFs
On Thu Oct 2, 2025 at 2:39 PM CEST, Jason Gunthorpe wrote:> On Thu, Oct 02, 2025 at 02:18:36PM +0200, Danilo Krummrich wrote: >> On Thu Oct 2, 2025 at 2:11 PM CEST, Jason Gunthorpe wrote: >> > On Wed, Oct 01, 2025 at 07:00:09PM -0700, John Hubbard wrote: >> >> Add a "supports_vf" flag to struct pci_driver to let drivers declare >> >> Virtual Function (VF) support. If a driver does not support VFs, then >> >> the PCI driver core will not probe() any VFs for that driver's devices. >> >> >> >> On the Rust side, add a const "SUPPORTS_VF" Driver trait, defaulting to >> >> false: drivers must explicitly opt into VF support. >> > >> > As I said in the other thread - please no. >> > >> > Linux drivers are expected to run on their VFs. >> >> The consequence would be that drivers for HW that can export VFs would need to >> be rejected upstream if they only support the PF, but no VFs. IMHO, that's an >> unreasonable requirement. > > Not rejected, they just need to open code a simple isvf check and fail > during probe if they really have a (hopefully temporary) problem.The question is whether it is due to a (temporary) problem, or if it is by design. I think it's not unreasonable to have a driver for the PF and a separate driver for the VFs if they are different enough; the drivers can still share common code of course. Surely, you can argue that if they have different enough requirements they should have different device IDs, but "different enough requirements" is pretty vague and it's not under our control either.> This not really a realistic case. Linux running in the VM *should* > have drivers that operate the VF, and those existing drivers *should* > work in the PF context. > > Drivers that work in VM but not in a host should not be encouraged!!I agree, we should indeed encourage HW manufacturers to design the HW in a way that a single driver works in both cases, i.e. less less code to maintain, less surface for bugs, etc., if that is what you mean. But, if there is another solution for VFs already, e.g. in the case of nova-core vGPU, why restrict drivers from opt-out of VFs. (In a previous reply I mentioned I prefer opt-in, but you convinced me that it should rather be opt-out.)> AFAICT this is even true for novacore, the driver should "work" but > the VF won't be provisioned today so it should fail startup in some > way. eg "no vram" or something like that. > >> > This temporary >> > weirdness of novacore should not be elevated to a core behavior that >> > people will misuse. >> >> It's not just nova-core, please see [1]. >> >> [1] https://lore.kernel.org/lkml/DD7TP31FEE92.2E0AKAHUOHVVF at kernel.org/ > > I responded there, I don't think the reasons those were added to ICE > and then cargo-culted are very good, not good enough to justify adding > it to the core code.Indeed, the justification of ICE is clearly wrong.
Jason Gunthorpe
2025-Oct-02 13:56 UTC
[PATCH v2 1/2] rust: pci: skip probing VFs if driver doesn't support VFs
On Thu, Oct 02, 2025 at 03:03:38PM +0200, Danilo Krummrich wrote:> I think it's not unreasonable to have a driver for the PF and a separate driver > for the VFs if they are different enough; the drivers can still share common > code of course.This isn't feasible without different PCI IDs. ICE does this for example where they have two totally different drivers, and of course two different PCI IDs.> Surely, you can argue that if they have different enough requirements they > should have different device IDs, but "different enough requirements" is pretty > vague and it's not under our control either.If you want two drivers in Linux you need two PCI IDs. We can't reliably select different drivers based on VFness because VFness is wiped out during virtualization.> But, if there is another solution for VFs already, e.g. in the case of nova-core > vGPU, why restrict drivers from opt-out of VFs. (In a previous reply I mentioned > I prefer opt-in, but you convinced me that it should rather be > opt-out.)I think nova-core has a temporary (OOT even!) issue that should be resolved - that doesn't justify adding core kernel infrastructure that will encourage more drivers to go away from our kernel design goals of drivers working equally in host and VM. Nova core should work in that it probes, detects an unprovisioned VF and then fails probe. This edge case could perhaps arise in a VM anyhow and needs to be handled anyhow. Also, this is all pointless until nova-core gets an sriov_configure callback - you can't even turn on SRIOV without that! Jason