Michael S. Tsirkin
2023-Sep-25 17:36 UTC
[PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
On Fri, Sep 22, 2023 at 01:22:33PM -0300, Jason Gunthorpe wrote:> On Fri, Sep 22, 2023 at 11:40:58AM -0400, Michael S. Tsirkin wrote: > > On Fri, Sep 22, 2023 at 12:15:34PM -0300, Jason Gunthorpe wrote: > > > On Fri, Sep 22, 2023 at 11:13:18AM -0400, Michael S. Tsirkin wrote: > > > > On Fri, Sep 22, 2023 at 12:25:06PM +0000, Parav Pandit wrote: > > > > > > > > > > > From: Jason Gunthorpe <jgg at nvidia.com> > > > > > > Sent: Friday, September 22, 2023 5:53 PM > > > > > > > > > > > > > > > > > And what's more, using MMIO BAR0 then it can work for legacy. > > > > > > > > > > > > Oh? How? Our team didn't think so. > > > > > > > > > > It does not. It was already discussed. > > > > > The device reset in legacy is not synchronous. > > > > > The drivers do not wait for reset to complete; it was written for the sw backend. > > > > > Hence MMIO BAR0 is not the best option in real implementations. > > > > > > > > Or maybe they made it synchronous in hardware, that's all. > > > > After all same is true for the IO BAR0 e.g. for the PF: IO writes > > > > are posted anyway. > > > > > > IO writes are not posted in PCI. > > > > Aha, I was confused. Thanks for the correction. I guess you just buffer > > subsequent transactions while reset is going on and reset quickly enough > > for it to be seemless then? > > >From a hardware perspective the CPU issues an non-posted IO write and > then it stops processing until the far side returns an IO completion. > > Using that you can emulate what the SW virtio model did and delay the > CPU from restarting until the reset is completed. > > Since MMIO is always posted, this is not possible to emulate directly > using MMIO. > > Converting IO into non-posted admin commands is a fairly close > recreation to what actual HW would do. > > JasonI thought you asked how it is possible for hardware to support reset if all it does is replace IO BAR with memory BAR. The answer is that since 2011 the reset is followed by read of the status field (which isn't much older than MSIX support from 2009 - which this code assumes). If one uses a Linux driver from 2011 and on then all you need to do is defer response to this read until after the reset is complete. If you are using older drivers or other OSes then reset using a posted write after device has operated for a while might not be safe, so e.g. you might trigger races if you remove drivers from system or trigger hot unplug. For example: static void virtio_pci_remove(struct pci_dev *pci_dev) { .... unregister_virtio_device(&vp_dev->vdev); ^^^^ triggers reset, then releases memory .... pci_disable_device(pci_dev); ^^^ blocks DMA by clearing bus master } here you could see some DMA into memory that has just been released. As Jason mentions hardware exists that is used under one of these two restrictions on the guest (Linux since 2011 or no resets while DMA is going on), and it works fine with these existing guests. Given the restrictions, virtio TC didn't elect to standardize this approach and instead opted for the heavier approach of converting IO into non-posted admin commands in software. -- MST