Danilo Krummrich
2024-Sep-24 19:56 UTC
[RFC 00/29] Introduce NVIDIA GPU Virtualization (vGPU) Support
On Tue, Sep 24, 2024 at 01:41:51PM -0300, Jason Gunthorpe wrote:> On Tue, Sep 24, 2024 at 12:50:55AM +0200, Danilo Krummrich wrote: > > > > From the VFIO side I would like to see something like this merged in > > > nearish future as it would bring a previously out of tree approach to > > > be fully intree using our modern infrastructure. This is a big win for > > > the VFIO world. > > > > > > As a commercial product this will be backported extensively to many > > > old kernels and that is harder/impossible if it isn't exclusively in > > > C. So, I think nova needs to co-exist in some way. > > > > We'll surely not support two drivers for the same thing in the long term, > > neither does it make sense, nor is it sustainable. > > What is being done here is the normal correct kernel thing to > do. Refactor the shared core code into a module and stick higher level > stuff on top of it. Ideally Nova/Nouveau would exist as peers > implementing DRM subsystem on this shared core infrastructure. We've > done this sort of thing before in other places in the kernel. It has > been proven to work well.So, that's where you have the wrong understanding of what we're working on: You seem to think that Nova is just another DRM subsystem layer on top of the NVKM parts (what you call the core driver) of Nouveau. But the whole point of Nova is to replace the NVKM parts of Nouveau, since that's where the problems we want to solve reside in.> > So, I'm not sure why you think there should be two drivers in the long > term? Do you have some technical reason why Nova can't fit into this > modular architecture?Like I said above, the whole point of Nova is to be the core driver, the DRM parts on top are more like "the icing on the cake".> > Regardless, assuming Nova will eventually propose merging duplicated > bootup code then I suggest it should be able to fully replace the C > code with a kconfig switch and provide C compatible interfaces for > VFIO. When Rust is sufficiently mature we can consider a deprecation > schedule for the C version. > > I agree duplication doesn't make sense, but if it is essential to make > everyone happy then we should do it to accommodate the ongoing Rust > experiment. > > > We have a lot of good reasons why we decided to move forward with Nova as a > > successor of Nouveau for GSP-based GPUs in the long term -- I also just held a > > talk about this at LPC. > > I know, but this series is adding a VFIO driver to the kernel, and aI have no concerns regarding the VFIO driver, this is about the new features that you intend to add to Nouveau.> complete Nova driver doesn't even exist yet. It is fine to think about > future plans, but let's not get too far ahead of ourselves here..Well, that's true, but we can't just add new features to something that has been agreed to be replaced without having a strategy for this for the successor.> > > For the short/mid term I think it may be reasonable to start with > > Nouveau, but this must be based on some agreements, for instance: > > > > - take responsibility, e.g. commitment to help with maintainance with some of > > NVKM / NVIDIA GPU core (or whatever we want to call it) within Nouveau > > I fully expect NVIDIA teams to own this core driver and VFIO parts. I > see there are no changes to the MAINTAINERs file in this RFC, that > will need to be corrected.Well, I did not say to just take over the biggest part of Nouveau. Currently - and please correct me if I'm wrong - you make it sound to me as if you're not willing to respect the decisions that have been taken by Nouveau and DRM maintainers.> > > - commitment to help with Nova in general and, once applicable, move the vGPU > > parts over to Nova > > I think you will get help with Nova based on its own merit, but I > don't like where you are going with this. Linus has had negative > things to say about this sort of cross-linking and I agree with > him. We should not be trying to extract unrelated promises on Nova as > a condition for progressing a VFIO series. :\No cross-linking, no unrelated promises. Again, we're working on a successor of Nouveau and if we keep adding features to Nouveau in the meantime, we have to have a strategy for the transition, otherwise we're effectively just ignoring this decision. So, I really need you to respect the fact that there has been a decision for a successor and that this *is* in fact relevant for all major changes to Nouveau as well. Once you do this, we get the chance to work things out for the short/mid term and for the long term and make everyone benefit. I encourage that NVIDIA wants to move things upstream and I'm absolutely willing to collaborate and help with the use-cases and goals NVIDIA has. But it really has to be a collaboration and this starts with acknowledging the goals of *each other*.> > > But I think the very last one naturally happens if we stop further support for > > new HW in Nouveau at some point. > > I expect the core code would continue to support new HW going forward > to support the VFIO driver, even if nouveau doesn't use it, until Rust > reaches some full ecosystem readyness for the server space.>From an upstream perspective the kernel doesn't need to consider OOT drivers,i.e. the guest driver. This doesn't mean that we can't work something out for a seamless transition though. But again, this can only really work if we acknowledge the goals of each other.> > There are going to be a lot of users of this code, let's not rush to > harm them please.Please abstain from such kind of unconstructive insinuations; it's ridiculous to imply that upstream kernel developers and maintainers would harm the users of NVIDIA GPUs.> > Fortunately there is no use case for DRM and VFIO to coexist in a > hypervisor, so this does not turn into a such a technical problem like > most other dual-driver situations. > > Jason >
Dave Airlie
2024-Sep-24 22:52 UTC
[RFC 00/29] Introduce NVIDIA GPU Virtualization (vGPU) Support
On Wed, 25 Sept 2024 at 05:57, Danilo Krummrich <dakr at kernel.org> wrote:> > On Tue, Sep 24, 2024 at 01:41:51PM -0300, Jason Gunthorpe wrote: > > On Tue, Sep 24, 2024 at 12:50:55AM +0200, Danilo Krummrich wrote: > > > > > > From the VFIO side I would like to see something like this merged in > > > > nearish future as it would bring a previously out of tree approach to > > > > be fully intree using our modern infrastructure. This is a big win for > > > > the VFIO world. > > > > > > > > As a commercial product this will be backported extensively to many > > > > old kernels and that is harder/impossible if it isn't exclusively in > > > > C. So, I think nova needs to co-exist in some way. > > > > > > We'll surely not support two drivers for the same thing in the long term, > > > neither does it make sense, nor is it sustainable. > > > > What is being done here is the normal correct kernel thing to > > do. Refactor the shared core code into a module and stick higher level > > stuff on top of it. Ideally Nova/Nouveau would exist as peers > > implementing DRM subsystem on this shared core infrastructure. We've > > done this sort of thing before in other places in the kernel. It has > > been proven to work well. > > So, that's where you have the wrong understanding of what we're working on: You > seem to think that Nova is just another DRM subsystem layer on top of the NVKM > parts (what you call the core driver) of Nouveau. > > But the whole point of Nova is to replace the NVKM parts of Nouveau, since > that's where the problems we want to solve reside in.Just to re-emphasise for Jason who might not be as across this stuff, NVKM replacement with rust is the main reason for the nova project, 100% the driving force for nova is the unstable NVIDIA firmware API. The ability to use rust proc-macros to hide the NVIDIA instability instead of trying to do it in C by either generators or abusing C macros (which I don't think are sufficient). The lower level nvkm driver needs to start being in rust before we can add support for newer stuff. Now there is possibly some scope about evolving the rust pieces in it as, rust wrapped in C APIs to make things easier for backports or avoid some pitfalls, but that is a discussion that we need to have here. I think the idea of a nova drm and nova core driver architecture is acceptable to most of us, but long term trying to main a nouveau based nvkm is definitely not acceptable due to the unstable firmware APIs. Dave.
Jason Gunthorpe
2024-Sep-25 00:53 UTC
[RFC 00/29] Introduce NVIDIA GPU Virtualization (vGPU) Support
On Tue, Sep 24, 2024 at 09:56:58PM +0200, Danilo Krummrich wrote:> Currently - and please correct me if I'm wrong - you make it sound to me as if > you're not willing to respect the decisions that have been taken by Nouveau and > DRM maintainers.I've never said anything about your work, go do Nova, have fun. I'm just not agreeing to being forced into taking Rust dependencies in VFIO because Nova is participating in the Rust Experiment. I think the reasonable answer is to accept some code duplication, or try to consolidate around a small C core. I understad this is different than you may have planned so far for Nova, but all projects are subject to community feedback, especially when faced with new requirements. I think this discussion is getting a little overheated, there is lots of space here for everyone to do their things. Let's not get too excited.> I encourage that NVIDIA wants to move things upstream and I'm absolutely willing > to collaborate and help with the use-cases and goals NVIDIA has. But it really > has to be a collaboration and this starts with acknowledging the goals of *each > other*.I've always acknowledged Nova's goal - it is fine. It is just quite incompatible with the VFIO side requirement of no Rust in our stack until the ecosystem can consume it. I belive there is no reason we can't find an agreeable compromise.> > I expect the core code would continue to support new HW going forward > > to support the VFIO driver, even if nouveau doesn't use it, until Rust > > reaches some full ecosystem readyness for the server space. > > From an upstream perspective the kernel doesn't need to consider OOT drivers, > i.e. the guest driver.?? VFIO already took the decision that it is agnostic to what is running in the VM. Run Windows-only VMs for all we care, it is still supposed to be virtualized correctly.> > There are going to be a lot of users of this code, let's not rush to > > harm them please. > > Please abstain from such kind of unconstructive insinuations; it's ridiculous to > imply that upstream kernel developers and maintainers would harm the users of > NVIDIA GPUs.You literally just said you'd want to effectively block usable VFIO support for new GPU HW when "we stop further support for new HW in Nouveau at some point" and "move the vGPU parts over to Nova(& rust)". I don't agree to that, it harms VFIO users, and is not acknowledging that conflicting goals exist. VFIO will decide when it starts to depend on rust, Nova should not force that decision on VFIO. They are very different ecosystems with different needs. Jason