Tian, Kevin
2024-Sep-26 06:43 UTC
[RFC 00/29] Introduce NVIDIA GPU Virtualization (vGPU) Support
> From: Jason Gunthorpe <jgg at nvidia.com> > Sent: Monday, September 23, 2024 11:02 PM > > On Mon, Sep 23, 2024 at 06:22:33AM +0000, Tian, Kevin wrote: > > > From: Zhi Wang <zhiw at nvidia.com> > > > Sent: Sunday, September 22, 2024 8:49 PM > > > > > [...] > > > > > > The NVIDIA vGPU VFIO module together with VFIO sits on VFs, provides > > > extended management and features, e.g. selecting the vGPU types, > support > > > live migration and driver warm update. > > > > > > Like other devices that VFIO supports, VFIO provides the standard > > > userspace APIs for device lifecycle management and advance feature > > > support. > > > > > > The NVIDIA vGPU manager provides necessary support to the NVIDIA > vGPU VFIO > > > variant driver to create/destroy vGPUs, query available vGPU types, select > > > the vGPU type, etc. > > > > > > On the other side, NVIDIA vGPU manager talks to the NVIDIA GPU core > driver, > > > which provide necessary support to reach the HW functions. > > > > > > > I'm not sure VFIO is the right place to host the NVIDIA vGPU manager. > > It's very NVIDIA specific and naturally fit in the PF driver. > > drm isn't a particularly logical place for that either :| >This RFC doesn't expose any new uAPI in the vGPU manager, e.g. with the vGPU type hard-coded to L40-24Q. In this way the boundary between code in VFIO and code in PF driver is probably more a vendor specific choice. However according to the cover letter it's reasonable for future extension to implement new uAPI for admin to select the vGPU type and potentially do more manual configurations before the target VF can be used: Then there comes an open whether VFIO is a right place to host such vendor specific provisioning interface. The existing mdev type based provisioning mechanism was considered a bad fit already. IIRC the previous discussion came to suggest putting the provisioning interface in the PF driver. There may be chance to generalize and move to VFIO but no idea what it will be until multiple drivers already demonstrate their own implementations as the base for discussion. But now seems you prefer to vendors putting their own provisioning interface in VFIO directly? Thanks Kevin
Jason Gunthorpe
2024-Sep-26 12:55 UTC
[RFC 00/29] Introduce NVIDIA GPU Virtualization (vGPU) Support
On Thu, Sep 26, 2024 at 06:43:44AM +0000, Tian, Kevin wrote:> Then there comes an open whether VFIO is a right place to host such > vendor specific provisioning interface. The existing mdev type based > provisioning mechanism was considered a bad fit already.> IIRC the previous discussion came to suggest putting the provisioning > interface in the PF driver. There may be chance to generalize and > move to VFIO but no idea what it will be until multiple drivers already > demonstrate their own implementations as the base for discussion.I am looking at fwctl do to alot of this in the SRIOV world. You'd provision the VF prior to opening VFIO using the fwctl interface and the VFIO would perceive a VF that has exactly the required properties. At least for SRIOV where the VM is talking directly to device FW, mdev/paravirtualization would be different.> But now seems you prefer to vendors putting their own provisioning > interface in VFIO directly?Maybe not, just that drm isn't the right place either. If the we do fwctl stuff then the VF provisioning would be done through a fwctl driver. I'm not entirely sure yet what this whole 'mgr' component is actually doing though. Jason