Greg KH
2024-Sep-26 12:54 UTC
[RFC 00/29] Introduce NVIDIA GPU Virtualization (vGPU) Support
On Thu, Sep 26, 2024 at 09:42:39AM -0300, Jason Gunthorpe wrote:> On Thu, Sep 26, 2024 at 11:14:27AM +0200, Greg KH wrote: > > On Mon, Sep 23, 2024 at 12:01:40PM -0300, Jason Gunthorpe wrote: > > > On Mon, Sep 23, 2024 at 10:49:07AM +0200, Danilo Krummrich wrote: > > > > > 2. Proposal for upstream > > > > > =======================> > > > > > > > What is the strategy in the mid / long term with this? > > > > > > > > As you know, we're trying to move to Nova and the blockers with the device / > > > > driver infrastructure have been resolved and we're able to move forward. Besides > > > > that, Dave made great progress on the firmware abstraction side of things. > > > > > > > > Is this more of a proof of concept? Do you plan to work on Nova in general and > > > > vGPU support for Nova? > > > > > > This is intended to be a real product that customers would use, it is > > > not a proof of concept. There is alot of demand for this kind of > > > simplified virtualization infrastructure in the host side. The series > > > here is the first attempt at making thin host infrastructure and > > > Zhi/etc are doing it with an upstream-first approach. > > > > > > >From the VFIO side I would like to see something like this merged in > > > nearish future as it would bring a previously out of tree approach to > > > be fully intree using our modern infrastructure. This is a big win for > > > the VFIO world. > > > > > > As a commercial product this will be backported extensively to many > > > old kernels and that is harder/impossible if it isn't exclusively in > > > C. So, I think nova needs to co-exist in some way. > > > > Please never make design decisions based on old ancient commercial > > kernels that have any relevance to upstream kernel development > > today. > > Greg, you are being too extreme. Those "ancient commercial kernels" > have a huge relevance to alot of our community because they are the > users that actually run the code we are building and pay for it to be > created. Yes we usually (but not always!) push back on accommodations > upstream, but taking hard dependencies on rust is currently a very > different thing.That's fine, but again, do NOT make design decisions based on what you can, and can not, feel you can slide by one of these companies to get it into their old kernels. That's what I take objection to here. Also always remember please, that the % of overall Linux kernel installs, even counting out Android and embedded, is VERY tiny for these companies. The huge % overall is doing the "right thing" by using upstream kernels. And with the laws in place now that % is only going to grow and those older kernels will rightfully fall away into even smaller %. I know those companies pay for many developers, I'm not saying that their contributions are any less or more important than others, they all are equal. You wouldn't want design decisions for a patch series to be dictated by some really old Yocto kernel restrictions that are only in autos, right? We are a large community, that's what I'm saying.> Otherwise, let's slow down here. Nova is still years away from being > finished. Nouveau is the in-tree driver for this HW. This series > improves on Nouveau. We are definitely not at the point of refusing > new code because it is not writte in Rust, RIGHT?No, I do object to "we are ignoring the driver being proposed by the developers involved for this hardware by adding to the old one instead" which it seems like is happening here. Anyway, let's focus on the code, there's already real issues with this patch series as pointed out by me and others that need to be addressed before it can go anywhere. thanks, greg k-h
Danilo Krummrich
2024-Sep-26 13:07 UTC
[RFC 00/29] Introduce NVIDIA GPU Virtualization (vGPU) Support
On Thu, Sep 26, 2024 at 02:54:38PM +0200, Greg KH wrote:> On Thu, Sep 26, 2024 at 09:42:39AM -0300, Jason Gunthorpe wrote: > > On Thu, Sep 26, 2024 at 11:14:27AM +0200, Greg KH wrote: > > > On Mon, Sep 23, 2024 at 12:01:40PM -0300, Jason Gunthorpe wrote: > > > > On Mon, Sep 23, 2024 at 10:49:07AM +0200, Danilo Krummrich wrote: > > > > > > 2. Proposal for upstream > > > > > > =======================> > > > > > > > > > What is the strategy in the mid / long term with this? > > > > > > > > > > As you know, we're trying to move to Nova and the blockers with the device / > > > > > driver infrastructure have been resolved and we're able to move forward. Besides > > > > > that, Dave made great progress on the firmware abstraction side of things. > > > > > > > > > > Is this more of a proof of concept? Do you plan to work on Nova in general and > > > > > vGPU support for Nova? > > > > > > > > This is intended to be a real product that customers would use, it is > > > > not a proof of concept. There is alot of demand for this kind of > > > > simplified virtualization infrastructure in the host side. The series > > > > here is the first attempt at making thin host infrastructure and > > > > Zhi/etc are doing it with an upstream-first approach. > > > > > > > > >From the VFIO side I would like to see something like this merged in > > > > nearish future as it would bring a previously out of tree approach to > > > > be fully intree using our modern infrastructure. This is a big win for > > > > the VFIO world. > > > > > > > > As a commercial product this will be backported extensively to many > > > > old kernels and that is harder/impossible if it isn't exclusively in > > > > C. So, I think nova needs to co-exist in some way. > > > > > > Please never make design decisions based on old ancient commercial > > > kernels that have any relevance to upstream kernel development > > > today. > > > > Greg, you are being too extreme. Those "ancient commercial kernels" > > have a huge relevance to alot of our community because they are the > > users that actually run the code we are building and pay for it to be > > created. Yes we usually (but not always!) push back on accommodations > > upstream, but taking hard dependencies on rust is currently a very > > different thing. > > That's fine, but again, do NOT make design decisions based on what you > can, and can not, feel you can slide by one of these companies to get it > into their old kernels. That's what I take objection to here. > > Also always remember please, that the % of overall Linux kernel > installs, even counting out Android and embedded, is VERY tiny for these > companies. The huge % overall is doing the "right thing" by using > upstream kernels. And with the laws in place now that % is only going > to grow and those older kernels will rightfully fall away into even > smaller %. > > I know those companies pay for many developers, I'm not saying that > their contributions are any less or more important than others, they all > are equal. You wouldn't want design decisions for a patch series to be > dictated by some really old Yocto kernel restrictions that are only in > autos, right? We are a large community, that's what I'm saying. > > > Otherwise, let's slow down here. Nova is still years away from being > > finished. Nouveau is the in-tree driver for this HW. This series > > improves on Nouveau. We are definitely not at the point of refusing > > new code because it is not writte in Rust, RIGHT?Just a reminder on what I said and not said, respectively. I never said we can't support this in Nouveau for the short and mid term. But we can't add new features and support new use-cases in Nouveau *without* considering the way forward to the new driver.> > No, I do object to "we are ignoring the driver being proposed by the > developers involved for this hardware by adding to the old one instead" > which it seems like is happening here. > > Anyway, let's focus on the code, there's already real issues with this > patch series as pointed out by me and others that need to be addressed > before it can go anywhere. > > thanks, > > greg k-h >
Jason Gunthorpe
2024-Sep-26 14:40 UTC
[RFC 00/29] Introduce NVIDIA GPU Virtualization (vGPU) Support
On Thu, Sep 26, 2024 at 02:54:38PM +0200, Greg KH wrote:> That's fine, but again, do NOT make design decisions based on what you > can, and can not, feel you can slide by one of these companies to get it > into their old kernels. That's what I take objection to here.It is not slide by. It is a recognition that participating in the community gives everyone value. If you excessively deny value from one side they will have no reason to participate. In this case the value is that, with enough light work, the kernel-fork community can deploy this code to their users. This has been the accepted bargin for a long time now. There is a great big question mark over Rust regarding what impact it actually has on this dynamic. It is definitely not just backport a few hundred upstream patches. There is clearly new upstream development work needed still - arch support being a very obvious one.> Also always remember please, that the % of overall Linux kernel > installs, even counting out Android and embedded, is VERY tiny for these > companies. The huge % overall is doing the "right thing" by using > upstream kernels. And with the laws in place now that % is only going > to grow and those older kernels will rightfully fall away into even > smaller %.Who is "doing the right thing"? That is not what I see, we sell server HW to *everyone*. There are a couple sites that are "near" upstream, but that is not too common. Everyone is running some kind of kernel fork. I dislike this generalization you do with % of users. Almost 100% of NVIDIA server HW are running forks. I would estimate around 10% is above a 6.0 baseline. It is not tiny either, NVIDIA sold like $60B of server HW running Linux last year with this kind of demographic. So did Intel, AMD, etc. I would not describe this as "VERY tiny". Maybe you mean RHEL-alike specifically, and yes, they are a diminishing install share. However, the hyperscale companies more than make up for that with their internal secret proprietary forks :(> > Otherwise, let's slow down here. Nova is still years away from being > > finished. Nouveau is the in-tree driver for this HW. This series > > improves on Nouveau. We are definitely not at the point of refusing > > new code because it is not writte in Rust, RIGHT? > > No, I do object to "we are ignoring the driver being proposed by the > developers involved for this hardware by adding to the old one instead" > which it seems like is happening here.That is too harsh. We've consistently taken a community position that OOT stuff doesn't matter, and yes that includes OOT stuff that people we trust and respect are working on. Until it is ready for submission, and ideally merged, it is an unknown quantity. Good well meaning people routinely drop their projects, good projects run into unexpected roadblocks, and life happens. Nova is not being ignored, there is dialog, and yes some disagreement. Again, nobody here is talking about disrupting Nova. We just want to keep going as-is until we can all agree together it is ready to make a change. Jason