thr3ads.net - Linux Virtualization - [PATCH V1 vfio 9/9] vfio/virtio: Introduce a vfio driver over virtio devices [Oct 2023]

If this information is useful, please help other people find it:
Share via:

Yishai Hadas

2023-Oct-18 09:01 UTC

[PATCH V1 vfio 9/9] vfio/virtio: Introduce a vfio driver over virtio devices

On 17/10/2023 23:24, Alex Williamson wrote:> On Tue, 17 Oct 2023 16:42:17 +0300
> Yishai Hadas <yishaih at nvidia.com> wrote:
>> +static int virtiovf_pci_probe(struct pci_dev *pdev,
>> +			      const struct pci_device_id *id)
>> +{
>> +	const struct vfio_device_ops *ops = &virtiovf_acc_vfio_pci_ops;
>> +	struct virtiovf_pci_core_device *virtvdev;
>> +	int ret;
>> +
>> +	if (pdev->is_virtfn &&
virtiovf_support_legacy_access(pdev) &&
>> +	    !virtiovf_bar0_exists(pdev) && pdev->msix_cap)
>> +		ops = &virtiovf_acc_vfio_pci_tran_ops;
>
> This is still an issue for me, it's a very narrow use case where we
> have a modern device and want to enable legacy support.  Implementing an
> IO BAR and mangling the device ID seems like it should be an opt-in,
> not standard behavior for any compatible device.  Users should
> generally expect that the device they see in the host is the device
> they see in the guest.  They might even rely on that principle.
Users here mainly refer to cloud operators.

We may assume, I believe, that they will be fine with seeing a 
transitional device in the guest as they would like to get the legacy IO 
support for their system.

However, we can still consider supplying a configuration knob in the 
device layer (e.g. in the DPU side) to let a cloud operator turning off 
the legacy capability.

In that case upon probe() of the vfio-virtio driver, we'll just pick-up 
the default vfio-pci 'ops' and in the guest we may have the same device 
ID as of in the host.

With that approach we may not require a HOST side control (i.e. sysfs, 
etc.), but stay with a s device control based on its user manual.

At the end, we don't expect any functional issue nor any compatible 
problem with the new driver, both modern and legacy drivers can work in 
the guest.

Can that work for you ?
>
> We can't use the argument that users wanting the default device should
> use vfio-pci rather than virtio-vfio-pci because we've already defined
> the algorithm by which libvirt should choose a variant driver for a
> device.  libvirt will choose this driver for all virtio-net devices.
>
> This driver effectively has the option to expose two different profiles
> for the device, native or transitional.  We've discussed profile
> support for variant drivers previously as an equivalent functionality
> to mdev types, but the only use case for this currently is out-of-tree.
> I think this might be the opportunity to define how device profiles are
> exposed and selected in a variant driver.
>
> Jason had previously suggested a devlink interface for this, but I
> understand that path had been shot down by devlink developers.  Another
> obvious option is sysfs, where we might imagine an optional
"profiles"
> directory, perhaps under vfio-dev.  Attributes of "available" and
> "current" could allow discovery and selection of a profile
similar to
> mdev types.
Referring to the sysfs option,

Do you expect the sysfs data to effect the libvirt decision ? may that 
require changes in libvirt ?

In addition,
May that be too late as the sysfs entry will be created upon driver 
binding by libvirt or that we have in mind some other option to control 
with that ?

Jason,
Can you please comment here as well ?
> Is this where we should head with this or are there other options to
> confine this transitional behavior?
>
> BTW, what is "acc" in virtiovf_acc_vfio_pci_ops?
"acc" is just a short-cut to "access", see also here[1] a
similar usage.

[1] 
https://elixir.bootlin.com/linux/v6.6-rc6/source/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c#L1380
>
>> +
>> +	virtvdev = vfio_alloc_device(virtiovf_pci_core_device,
core_device.vdev,
>> +				     &pdev->dev, ops);
>> +	if (IS_ERR(virtvdev))
>> +		return PTR_ERR(virtvdev);
>> +
>> +	dev_set_drvdata(&pdev->dev, &virtvdev->core_device);
>> +	ret = vfio_pci_core_register_device(&virtvdev->core_device);
>> +	if (ret)
>> +		goto out;
>> +	return 0;
>> +out:
>> +	vfio_put_device(&virtvdev->core_device.vdev);
>> +	return ret;
>> +}
>> +
>> +static void virtiovf_pci_remove(struct pci_dev *pdev)
>> +{
>> +	struct virtiovf_pci_core_device *virtvdev =
dev_get_drvdata(&pdev->dev);
>> +
>> +	vfio_pci_core_unregister_device(&virtvdev->core_device);
>> +	vfio_put_device(&virtvdev->core_device.vdev);
>> +}
>> +
>> +static const struct pci_device_id virtiovf_pci_table[] = {
>> +	/* Only virtio-net is supported/tested so far */
>> +	{ PCI_DRIVER_OVERRIDE_DEVICE_VFIO(PCI_VENDOR_ID_REDHAT_QUMRANET,
0x1041) },
>> +	{}
>> +};
>> +
>> +MODULE_DEVICE_TABLE(pci, virtiovf_pci_table);
>> +
>> +static struct pci_driver virtiovf_pci_driver = {
>> +	.name = KBUILD_MODNAME,
>> +	.id_table = virtiovf_pci_table,
>> +	.probe = virtiovf_pci_probe,
>> +	.remove = virtiovf_pci_remove,
>> +	.err_handler = &vfio_pci_core_err_handlers,
>> +	.driver_managed_dma = true,
>> +};
>> +
>> +module_pci_driver(virtiovf_pci_driver);
>> +
>> +MODULE_LICENSE("GPL");
>> +MODULE_AUTHOR("Yishai Hadas <yishaih at nvidia.com>");
>> +MODULE_DESCRIPTION(
>> +	"VIRTIO VFIO PCI - User Level meta-driver for VIRTIO device
family");
> Not yet "family" per the device table.  Thanks,
Right

How about dropping the word "family" and say instead ".. for
VIRTIO
devices" as we have in the Kconfig in that patch [1] ?

[1] "This provides support for exposing VIRTIO VF devices .."

Yishai
> Alex
>

Alex Williamson

2023-Oct-18 12:51 UTC

head link

[PATCH V1 vfio 9/9] vfio/virtio: Introduce a vfio driver over virtio devices

On Wed, 18 Oct 2023 12:01:57 +0300
Yishai Hadas <yishaih at nvidia.com> wrote:
> On 17/10/2023 23:24, Alex Williamson wrote:
> > On Tue, 17 Oct 2023 16:42:17 +0300
> > Yishai Hadas <yishaih at nvidia.com> wrote:  
> >> +static int virtiovf_pci_probe(struct pci_dev *pdev,
> >> +			      const struct pci_device_id *id)
> >> +{
> >> +	const struct vfio_device_ops *ops =
&virtiovf_acc_vfio_pci_ops;
> >> +	struct virtiovf_pci_core_device *virtvdev;
> >> +	int ret;
> >> +
> >> +	if (pdev->is_virtfn &&
virtiovf_support_legacy_access(pdev) &&
> >> +	    !virtiovf_bar0_exists(pdev) && pdev->msix_cap)
> >> +		ops = &virtiovf_acc_vfio_pci_tran_ops;  
> >
> > This is still an issue for me, it's a very narrow use case where
we
> > have a modern device and want to enable legacy support.  Implementing
an
> > IO BAR and mangling the device ID seems like it should be an opt-in,
> > not standard behavior for any compatible device.  Users should
> > generally expect that the device they see in the host is the device
> > they see in the guest.  They might even rely on that principle.  
> 
> Users here mainly refer to cloud operators.
> 
> We may assume, I believe, that they will be fine with seeing a 
> transitional device in the guest as they would like to get the legacy IO 
> support for their system.
> 
> However, we can still consider supplying a configuration knob in the 
> device layer (e.g. in the DPU side) to let a cloud operator turning off 
> the legacy capability.
This is a driver that implements to the virtio standard, so I don't see
how we can assume that the current use case is the only use case we'll
ever see.  Therefore we cannot assume this will only be consumed by a
specific cloud operator making use of NVIDIA hardware.  Other vendors
may implement this spec for other environments.  We might even see an
implementation of a virtual virtio-net device with SR-IOV.
> In that case upon probe() of the vfio-virtio driver, we'll just pick-up
> the default vfio-pci 'ops' and in the guest we may have the same
device
> ID as of in the host.
> 
> With that approach we may not require a HOST side control (i.e. sysfs, 
> etc.), but stay with a s device control based on its user manual.
> 
> At the end, we don't expect any functional issue nor any compatible 
> problem with the new driver, both modern and legacy drivers can work in 
> the guest.
> 
> Can that work for you ?
This is not being proposed as an NVIDIA specific driver, we can't make
such claims relative to all foreseeable implementations of virtio-net.
> > We can't use the argument that users wanting the default device
should
> > use vfio-pci rather than virtio-vfio-pci because we've already
defined
> > the algorithm by which libvirt should choose a variant driver for a
> > device.  libvirt will choose this driver for all virtio-net devices.
> >
> > This driver effectively has the option to expose two different
profiles
> > for the device, native or transitional.  We've discussed profile
> > support for variant drivers previously as an equivalent functionality
> > to mdev types, but the only use case for this currently is
out-of-tree.
> > I think this might be the opportunity to define how device profiles
are
> > exposed and selected in a variant driver.
> >
> > Jason had previously suggested a devlink interface for this, but I
> > understand that path had been shot down by devlink developers. 
Another
> > obvious option is sysfs, where we might imagine an optional
"profiles"
> > directory, perhaps under vfio-dev.  Attributes of
"available" and
> > "current" could allow discovery and selection of a profile
similar to
> > mdev types.  
> 
> Referring to the sysfs option,
> 
> Do you expect the sysfs data to effect the libvirt decision ? may that 
> require changes in libvirt ?
We don't have such changes in libvirt for mdev, other than the ability
of the nodedev information to return available type information.
Generally the mdev type is configured outside of libvirt, which falls
into the same sort of configuration as necessary to enable migration on
mlx5-vfio-pci.

It's possible we could allows a default profile which would be used if
the open_device callback is used without setting a profile, but we need
to be careful of vGPU use cases where profiles consume resources and a
default selection may affect other devices.
> In addition,
> May that be too late as the sysfs entry will be created upon driver 
> binding by libvirt or that we have in mind some other option to control 
> with that ?
No different than mlx5-vfio-pci, there's a necessary point between
binding the driver and using the device where configuration is needed.
> Jason,
> Can you please comment here as well ?
> 
> > Is this where we should head with this or are there other options to
> > confine this transitional behavior?
> >
> > BTW, what is "acc" in virtiovf_acc_vfio_pci_ops?  
> 
> "acc" is just a short-cut to "access", see also here[1]
a similar usage.
> 
> [1] 
>
https://elixir.bootlin.com/linux/v6.6-rc6/source/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c#L1380
Per the Kconfig:

	  This provides generic PCI support for HiSilicon ACC devices
	  using the VFIO framework.

Therefore I understood acc in this use case to be a formal reference to
the controller name.
> >> +
> >> +	virtvdev = vfio_alloc_device(virtiovf_pci_core_device,
core_device.vdev,
> >> +				     &pdev->dev, ops);
> >> +	if (IS_ERR(virtvdev))
> >> +		return PTR_ERR(virtvdev);
> >> +
> >> +	dev_set_drvdata(&pdev->dev,
&virtvdev->core_device);
> >> +	ret =
vfio_pci_core_register_device(&virtvdev->core_device);
> >> +	if (ret)
> >> +		goto out;
> >> +	return 0;
> >> +out:
> >> +	vfio_put_device(&virtvdev->core_device.vdev);
> >> +	return ret;
> >> +}
> >> +
> >> +static void virtiovf_pci_remove(struct pci_dev *pdev)
> >> +{
> >> +	struct virtiovf_pci_core_device *virtvdev =
dev_get_drvdata(&pdev->dev);
> >> +
> >> +	vfio_pci_core_unregister_device(&virtvdev->core_device);
> >> +	vfio_put_device(&virtvdev->core_device.vdev);
> >> +}
> >> +
> >> +static const struct pci_device_id virtiovf_pci_table[] = {
> >> +	/* Only virtio-net is supported/tested so far */
> >> +	{ PCI_DRIVER_OVERRIDE_DEVICE_VFIO(PCI_VENDOR_ID_REDHAT_QUMRANET,
0x1041) },
> >> +	{}
> >> +};
> >> +
> >> +MODULE_DEVICE_TABLE(pci, virtiovf_pci_table);
> >> +
> >> +static struct pci_driver virtiovf_pci_driver = {
> >> +	.name = KBUILD_MODNAME,
> >> +	.id_table = virtiovf_pci_table,
> >> +	.probe = virtiovf_pci_probe,
> >> +	.remove = virtiovf_pci_remove,
> >> +	.err_handler = &vfio_pci_core_err_handlers,
> >> +	.driver_managed_dma = true,
> >> +};
> >> +
> >> +module_pci_driver(virtiovf_pci_driver);
> >> +
> >> +MODULE_LICENSE("GPL");
> >> +MODULE_AUTHOR("Yishai Hadas <yishaih at
nvidia.com>");
> >> +MODULE_DESCRIPTION(
> >> +	"VIRTIO VFIO PCI - User Level meta-driver for VIRTIO device
family");
> > Not yet "family" per the device table.  Thanks,  
> 
> Right
> 
> How about dropping the word "family" and say instead ".. for
VIRTIO
> devices" as we have in the Kconfig in that patch [1] ?
> 
> [1] "This provides support for exposing VIRTIO VF devices .."
Are we realistically extending this beyond virtio-net?  Maybe all the
descriptions should be limited to what is actually supported as
proposed rather than aspirational goals.  Thanks,

Alex

Linux Virtualization - Oct 2023 - [PATCH V1 vfio 9/9] vfio/virtio: Introduce a vfio driver over virtio devices

[PATCH V1 vfio 9/9] vfio/virtio: Introduce a vfio driver over virtio devices

[PATCH V1 vfio 9/9] vfio/virtio: Introduce a vfio driver over virtio devices