thr3ads.net - Nouveau - [RFC 00/29] Introduce NVIDIA GPU Virtualization (vGPU) Support [Sep 2024]

If this information is useful, please help other people find it:
Share via:

Tian, Kevin

2024-Sep-26 06:43 UTC

[RFC 00/29] Introduce NVIDIA GPU Virtualization (vGPU) Support

> From: Jason Gunthorpe <jgg at nvidia.com>
> Sent: Monday, September 23, 2024 11:02 PM
> 
> On Mon, Sep 23, 2024 at 06:22:33AM +0000, Tian, Kevin wrote:
> > > From: Zhi Wang <zhiw at nvidia.com>
> > > Sent: Sunday, September 22, 2024 8:49 PM
> > >
> > [...]
> > >
> > > The NVIDIA vGPU VFIO module together with VFIO sits on VFs,
provides
> > > extended management and features, e.g. selecting the vGPU types,
> support
> > > live migration and driver warm update.
> > >
> > > Like other devices that VFIO supports, VFIO provides the standard
> > > userspace APIs for device lifecycle management and advance
feature
> > > support.
> > >
> > > The NVIDIA vGPU manager provides necessary support to the NVIDIA
> vGPU VFIO
> > > variant driver to create/destroy vGPUs, query available vGPU
types, select
> > > the vGPU type, etc.
> > >
> > > On the other side, NVIDIA vGPU manager talks to the NVIDIA GPU
core
> driver,
> > > which provide necessary support to reach the HW functions.
> > >
> >
> > I'm not sure VFIO is the right place to host the NVIDIA vGPU
manager.
> > It's very NVIDIA specific and naturally fit in the PF driver.
> 
> drm isn't a particularly logical place for that either :|
> 
This RFC doesn't expose any new uAPI in the vGPU manager, e.g. with
the vGPU type hard-coded to L40-24Q. In this way the boundary between
code in VFIO and code in PF driver is probably more a vendor specific
choice.

However according to the cover letter it's reasonable for future extension
to implement new uAPI  for admin to select the vGPU type and potentially
do more manual configurations before the target VF can be used:

Then there comes an open whether VFIO is a right place to host such
vendor specific provisioning interface. The existing mdev type based
provisioning mechanism was considered a bad fit already.

IIRC the previous discussion came to suggest putting the provisioning
interface in the PF driver. There may be chance to generalize and
move to VFIO but no idea what it will be until multiple drivers already
demonstrate their own implementations as the base for discussion.

But now seems you prefer to vendors putting their own provisioning
interface in VFIO directly?

Thanks
Kevin

Jason Gunthorpe

2024-Sep-26 12:55 UTC

head link

[RFC 00/29] Introduce NVIDIA GPU Virtualization (vGPU) Support

On Thu, Sep 26, 2024 at 06:43:44AM +0000, Tian, Kevin wrote:
> Then there comes an open whether VFIO is a right place to host such
> vendor specific provisioning interface. The existing mdev type based
> provisioning mechanism was considered a bad fit already.
> IIRC the previous discussion came to suggest putting the provisioning
> interface in the PF driver. There may be chance to generalize and
> move to VFIO but no idea what it will be until multiple drivers already
> demonstrate their own implementations as the base for discussion.
I am looking at fwctl do to alot of this in the SRIOV world.

You'd provision the VF prior to opening VFIO using the fwctl interface
and the VFIO would perceive a VF that has exactly the required
properties. At least for SRIOV where the VM is talking directly to
device FW, mdev/paravirtualization would be different.
> But now seems you prefer to vendors putting their own provisioning
> interface in VFIO directly?
Maybe not, just that drm isn't the right place either. If the we do
fwctl stuff then the VF provisioning would be done through a fwctl
driver.

I'm not entirely sure yet what this whole 'mgr' component is
actually
doing though.

Jason

Nouveau - Sep 2024 - [RFC 00/29] Introduce NVIDIA GPU Virtualization (vGPU) Support

[RFC 00/29] Introduce NVIDIA GPU Virtualization (vGPU) Support

[RFC 00/29] Introduce NVIDIA GPU Virtualization (vGPU) Support