Michael S. Tsirkin
2023-Sep-25 18:36 UTC
[PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
On Mon, Sep 25, 2023 at 08:26:33AM +0000, Parav Pandit wrote:> > > > From: Jason Wang <jasowang at redhat.com> > > Sent: Monday, September 25, 2023 8:00 AM > > > > On Fri, Sep 22, 2023 at 8:25?PM Parav Pandit <parav at nvidia.com> wrote: > > > > > > > > > > From: Jason Gunthorpe <jgg at nvidia.com> > > > > Sent: Friday, September 22, 2023 5:53 PM > > > > > > > > > > > And what's more, using MMIO BAR0 then it can work for legacy. > > > > > > > > Oh? How? Our team didn't think so. > > > > > > It does not. It was already discussed. > > > The device reset in legacy is not synchronous. > > > > How do you know this? > > > Not sure the motivation of same discussion done in the OASIS with you and others in past. > > Anyways, please find the answer below. > > About reset, > The legacy device specification has not enforced below cited 1.0 driver requirement of 1.0. > > "The driver SHOULD consider a driver-initiated reset complete when it reads device status as 0." > > [1] https://ozlabs.org/~rusty/virtio-spec/virtio-0.9.5.pdfBasically, I think any drivers that did not read status (linux pre 2011) before freeing memory under DMA have a reset path that is racy wrt DMA, since memory writes are posted and IO writes while not posted have completion that does not order posted transactions, e.g. from pci express spec: D2b An I/O or Configuration Write Completion 37 is permitted to pass a Posted Request. having said that there were a ton of driver races discovered on this path in the years since, I suspect if one cares about this then just avoiding stress on reset is wise.> > > The drivers do not wait for reset to complete; it was written for the sw > > backend. > > > > Do you see there's a flush after reset in the legacy driver? > > > Yes. it only flushes the write by reading it. The driver does not get _wait_ for the reset to complete within the device like above.One can thinkably do that wait in hardware, though. Just defer completion until read is done.> Please see the reset flow of 1.x device as below. > In fact the comment of the 1.x device also needs to be updated to indicate that driver need to wait for the device to finish the reset. > I will send separate patch for improving this comment of vp_reset() to match the spec. > > static void vp_reset(struct virtio_device *vdev) > { > struct virtio_pci_device *vp_dev = to_vp_device(vdev); > struct virtio_pci_modern_device *mdev = &vp_dev->mdev; > > /* 0 status means a reset. */ > vp_modern_set_status(mdev, 0); > /* After writing 0 to device_status, the driver MUST wait for a read of > * device_status to return 0 before reinitializing the device. > * This will flush out the status write, and flush in device writes, > * including MSI-X interrupts, if any. > */ > while (vp_modern_get_status(mdev)) > msleep(1); > /* Flush pending VQ/configuration callbacks. */ > vp_synchronize_vectors(vdev); > } > > > > static void vp_reset(struct virtio_device *vdev) { > > struct virtio_pci_device *vp_dev = to_vp_device(vdev); > > /* 0 status means a reset. */ > > vp_legacy_set_status(&vp_dev->ldev, 0); > > /* Flush out the status write, and flush in device writes, > > * including MSi-X interrupts, if any. */ > > vp_legacy_get_status(&vp_dev->ldev); > > /* Flush pending VQ/configuration callbacks. */ > > vp_synchronize_vectors(vdev); > > } > > > > Thanks > > > > > > > > > Hence MMIO BAR0 is not the best option in real implementations. > > > >
Zhu, Lingshan
2023-Sep-26 02:34 UTC
[PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
On 9/26/2023 2:36 AM, Michael S. Tsirkin wrote:> On Mon, Sep 25, 2023 at 08:26:33AM +0000, Parav Pandit wrote: >> >>> From: Jason Wang <jasowang at redhat.com> >>> Sent: Monday, September 25, 2023 8:00 AM >>> >>> On Fri, Sep 22, 2023 at 8:25?PM Parav Pandit <parav at nvidia.com> wrote: >>>> >>>>> From: Jason Gunthorpe <jgg at nvidia.com> >>>>> Sent: Friday, September 22, 2023 5:53 PM >>>> >>>>>> And what's more, using MMIO BAR0 then it can work for legacy. >>>>> Oh? How? Our team didn't think so. >>>> It does not. It was already discussed. >>>> The device reset in legacy is not synchronous. >>> How do you know this? >>> >> Not sure the motivation of same discussion done in the OASIS with you and others in past. >> >> Anyways, please find the answer below. >> >> About reset, >> The legacy device specification has not enforced below cited 1.0 driver requirement of 1.0. >> >> "The driver SHOULD consider a driver-initiated reset complete when it reads device status as 0." >> >> [1] https://ozlabs.org/~rusty/virtio-spec/virtio-0.9.5.pdf > Basically, I think any drivers that did not read status (linux pre 2011) > before freeing memory under DMA have a reset path that is racy wrt DMA, since > memory writes are posted and IO writes while not posted have completion > that does not order posted transactions, e.g. from pci express spec: > D2b > An I/O or Configuration Write Completion 37 is permitted to pass a Posted Request. > having said that there were a ton of driver races discovered on this > path in the years since, I suspect if one cares about this then > just avoiding stress on reset is wise. > > > >>>> The drivers do not wait for reset to complete; it was written for the sw >>> backend. >>> >>> Do you see there's a flush after reset in the legacy driver? >>> >> Yes. it only flushes the write by reading it. The driver does not get _wait_ for the reset to complete within the device like above. > One can thinkably do that wait in hardware, though. Just defer completion until > read is done.I agree with MST. At least Intel devices work fine with vfio-pci and legacy driver without any changes. So far so good. Thanks Zhu Lingshan> >> Please see the reset flow of 1.x device as below. >> In fact the comment of the 1.x device also needs to be updated to indicate that driver need to wait for the device to finish the reset. >> I will send separate patch for improving this comment of vp_reset() to match the spec. >> >> static void vp_reset(struct virtio_device *vdev) >> { >> struct virtio_pci_device *vp_dev = to_vp_device(vdev); >> struct virtio_pci_modern_device *mdev = &vp_dev->mdev; >> >> /* 0 status means a reset. */ >> vp_modern_set_status(mdev, 0); >> /* After writing 0 to device_status, the driver MUST wait for a read of >> * device_status to return 0 before reinitializing the device. >> * This will flush out the status write, and flush in device writes, >> * including MSI-X interrupts, if any. >> */ >> while (vp_modern_get_status(mdev)) >> msleep(1); >> /* Flush pending VQ/configuration callbacks. */ >> vp_synchronize_vectors(vdev); >> } >> >> >>> static void vp_reset(struct virtio_device *vdev) { >>> struct virtio_pci_device *vp_dev = to_vp_device(vdev); >>> /* 0 status means a reset. */ >>> vp_legacy_set_status(&vp_dev->ldev, 0); >>> /* Flush out the status write, and flush in device writes, >>> * including MSi-X interrupts, if any. */ >>> vp_legacy_get_status(&vp_dev->ldev); >>> /* Flush pending VQ/configuration callbacks. */ >>> vp_synchronize_vectors(vdev); >>> } >>> >>> Thanks >>> >>> >>> >>>> Hence MMIO BAR0 is not the best option in real implementations. >>>> > _______________________________________________ > Virtualization mailing list > Virtualization at lists.linux-foundation.org > https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Parav Pandit
2023-Sep-26 03:45 UTC
[PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
> From: Michael S. Tsirkin <mst at redhat.com> > Sent: Tuesday, September 26, 2023 12:06 AM> One can thinkably do that wait in hardware, though. Just defer completion until > read is done. >Once OASIS does such new interface and if some hw vendor _actually_ wants to do such complex hw, may be vfio driver can adopt to it. When we worked with you, we discussed that there such hw does not have enough returns and hence technical committee choose to proceed with admin commands. I will skip re-discussing all over it again here. The current virto spec is delivering the best trade-offs of functionality, performance and light weight implementation with future forward path towards more features as Jason explained such as migration. All with near zero driver, qemu and sw involvement for rapidly growing feature set...