Danilo Krummrich
2024-Sep-23 22:50 UTC
[RFC 00/29] Introduce NVIDIA GPU Virtualization (vGPU) Support
On Mon, Sep 23, 2024 at 12:01:40PM -0300, Jason Gunthorpe wrote:> On Mon, Sep 23, 2024 at 10:49:07AM +0200, Danilo Krummrich wrote: > > > 2. Proposal for upstream > > > =======================> > > > What is the strategy in the mid / long term with this? > > > > As you know, we're trying to move to Nova and the blockers with the device / > > driver infrastructure have been resolved and we're able to move forward. Besides > > that, Dave made great progress on the firmware abstraction side of things. > > > > Is this more of a proof of concept? Do you plan to work on Nova in general and > > vGPU support for Nova? > > This is intended to be a real product that customers would use, it is > not a proof of concept. There is alot of demand for this kind of > simplified virtualization infrastructure in the host side.I see...> The series > here is the first attempt at making thin host infrastructure and > Zhi/etc are doing it with an upstream-first approach.This is great!> > From the VFIO side I would like to see something like this merged in > nearish future as it would bring a previously out of tree approach to > be fully intree using our modern infrastructure. This is a big win for > the VFIO world. > > As a commercial product this will be backported extensively to many > old kernels and that is harder/impossible if it isn't exclusively in > C. So, I think nova needs to co-exist in some way.We'll surely not support two drivers for the same thing in the long term, neither does it make sense, nor is it sustainable. We have a lot of good reasons why we decided to move forward with Nova as a successor of Nouveau for GSP-based GPUs in the long term -- I also just held a talk about this at LPC. For the short/mid term I think it may be reasonable to start with Nouveau, but this must be based on some agreements, for instance: - take responsibility, e.g. commitment to help with maintainance with some of NVKM / NVIDIA GPU core (or whatever we want to call it) within Nouveau - commitment to help with Nova in general and, once applicable, move the vGPU parts over to Nova But I think the very last one naturally happens if we stop further support for new HW in Nouveau at some point.> > Jason >
Jason Gunthorpe
2024-Sep-24 16:41 UTC
[RFC 00/29] Introduce NVIDIA GPU Virtualization (vGPU) Support
On Tue, Sep 24, 2024 at 12:50:55AM +0200, Danilo Krummrich wrote:> > From the VFIO side I would like to see something like this merged in > > nearish future as it would bring a previously out of tree approach to > > be fully intree using our modern infrastructure. This is a big win for > > the VFIO world. > > > > As a commercial product this will be backported extensively to many > > old kernels and that is harder/impossible if it isn't exclusively in > > C. So, I think nova needs to co-exist in some way. > > We'll surely not support two drivers for the same thing in the long term, > neither does it make sense, nor is it sustainable.What is being done here is the normal correct kernel thing to do. Refactor the shared core code into a module and stick higher level stuff on top of it. Ideally Nova/Nouveau would exist as peers implementing DRM subsystem on this shared core infrastructure. We've done this sort of thing before in other places in the kernel. It has been proven to work well. So, I'm not sure why you think there should be two drivers in the long term? Do you have some technical reason why Nova can't fit into this modular architecture? Regardless, assuming Nova will eventually propose merging duplicated bootup code then I suggest it should be able to fully replace the C code with a kconfig switch and provide C compatible interfaces for VFIO. When Rust is sufficiently mature we can consider a deprecation schedule for the C version. I agree duplication doesn't make sense, but if it is essential to make everyone happy then we should do it to accommodate the ongoing Rust experiment.> We have a lot of good reasons why we decided to move forward with Nova as a > successor of Nouveau for GSP-based GPUs in the long term -- I also just held a > talk about this at LPC.I know, but this series is adding a VFIO driver to the kernel, and a complete Nova driver doesn't even exist yet. It is fine to think about future plans, but let's not get too far ahead of ourselves here..> For the short/mid term I think it may be reasonable to start with > Nouveau, but this must be based on some agreements, for instance: > > - take responsibility, e.g. commitment to help with maintainance with some of > NVKM / NVIDIA GPU core (or whatever we want to call it) within NouveauI fully expect NVIDIA teams to own this core driver and VFIO parts. I see there are no changes to the MAINTAINERs file in this RFC, that will need to be corrected.> - commitment to help with Nova in general and, once applicable, move the vGPU > parts over to NovaI think you will get help with Nova based on its own merit, but I don't like where you are going with this. Linus has had negative things to say about this sort of cross-linking and I agree with him. We should not be trying to extract unrelated promises on Nova as a condition for progressing a VFIO series. :\> But I think the very last one naturally happens if we stop further support for > new HW in Nouveau at some point.I expect the core code would continue to support new HW going forward to support the VFIO driver, even if nouveau doesn't use it, until Rust reaches some full ecosystem readyness for the server space. There are going to be a lot of users of this code, let's not rush to harm them please. Fortunately there is no use case for DRM and VFIO to coexist in a hypervisor, so this does not turn into a such a technical problem like most other dual-driver situations. Jason