Michael S. Tsirkin
2021-May-06 08:12 UTC
[RFC PATCH V2 0/7] Do not read from descripto ring
On Thu, May 06, 2021 at 11:20:30AM +0800, Jason Wang wrote:> > ? 2021/4/23 ??4:09, Jason Wang ??: > > Hi: > > > > Sometimes, the driver doesn't trust the device. This is usually > > happens for the encrtpyed VM or VDUSE[1]. In both cases, technology > > like swiotlb is used to prevent the poking/mangling of memory from the > > device. But this is not sufficient since current virtio driver may > > trust what is stored in the descriptor table (coherent mapping) for > > performing the DMA operations like unmap and bounce so the device may > > choose to utilize the behaviour of swiotlb to perform attacks[2]. > > > > To protect from a malicous device, this series store and use the > > descriptor metadata in an auxiliay structure which can not be accessed > > via swiotlb instead of the ones in the descriptor table. This means > > the descriptor table is write-only from the view of the driver. > > > > Actually, we've almost achieved that through packed virtqueue and we > > just need to fix a corner case of handling mapping errors. For split > > virtqueue we just follow what's done in the packed. > > > > Note that we don't duplicate descriptor medata for indirect > > descriptors since it uses stream mapping which is read only so it's > > safe if the metadata of non-indirect descriptors are correct. > > > > For split virtqueue, the change increase the footprint due the the > > auxiliary metadata but it's almost neglectlable in the simple test > > like pktgen or netpef. > > > > Slightly tested with packed on/off, iommu on/of, swiotlb force/off in > > the guest. > > > > Please review. > > > > Changes from V1: > > - Always use auxiliary metadata for split virtqueue > > - Don't read from descripto when detaching indirect descriptor > > > Hi Michael: > > Our QE see no regression on the perf test for 10G but some regressions > (5%-10%) on 40G card. > > I think this is expected since we increase the footprint, are you OK with > this and we can try to optimize on top or you have other ideas? > > ThanksLet's try for just a bit, won't make this window anyway: I have an old idea. Add a way to find out that unmap is a nop (or more exactly does not use the address/length). Then in that case even with DMA API we do not need the extra data. Hmm?> > > > > [1] > > https://lore.kernel.org/netdev/fab615ce-5e13-a3b3-3715-a4203b4ab010 at redhat.com/T/ > > [2] > > https://yhbt.net/lore/all/c3629a27-3590-1d9f-211b-c0b7be152b32 at redhat.com/T/#mc6b6e2343cbeffca68ca7a97e0f473aaa871c95b > > > > Jason Wang (7): > > virtio-ring: maintain next in extra state for packed virtqueue > > virtio_ring: rename vring_desc_extra_packed > > virtio-ring: factor out desc_extra allocation > > virtio_ring: secure handling of mapping errors > > virtio_ring: introduce virtqueue_desc_add_split() > > virtio: use err label in __vring_new_virtqueue() > > virtio-ring: store DMA metadata in desc_extra for split virtqueue > > > > drivers/virtio/virtio_ring.c | 201 +++++++++++++++++++++++++---------- > > 1 file changed, 144 insertions(+), 57 deletions(-) > >
Christoph Hellwig
2021-May-06 12:38 UTC
[RFC PATCH V2 0/7] Do not read from descripto ring
On Thu, May 06, 2021 at 04:12:17AM -0400, Michael S. Tsirkin wrote:> Let's try for just a bit, won't make this window anyway: > > I have an old idea. Add a way to find out that unmap is a nop > (or more exactly does not use the address/length). > Then in that case even with DMA API we do not need > the extra data. Hmm?So we actually do have a check for that from the early days of the DMA API, but it only works at compile time: CONFIG_NEED_DMA_MAP_STATE. But given how rare configs without an iommu or swiotlb are these days it has stopped to be very useful. Unfortunately a runtime-version is not entirely trivial, but maybe if we allow for false positives we could do something like this bool dma_direct_need_state(struct device *dev) { /* some areas could not be covered by any map at all */ if (dev->dma_range_map) return false; if (force_dma_unencrypted(dev)) return false; if (dma_direct_need_sync(dev)) return false; return *dev->dma_mask == DMA_BIT_MASK(64); } bool dma_need_state(struct device *dev) { const struct dma_map_ops *ops = get_dma_ops(dev); if (dma_map_direct(dev, ops)) return dma_direct_need_state(dev); return ops->unmap_page || ops->sync_single_for_cpu || ops->sync_single_for_device; }