Parav Pandit
2022-Sep-07 19:06 UTC
[PATCH v5 2/2] virtio-net: use mtu size as buffer length for big packets
> From: Michael S. Tsirkin <mst at redhat.com> > Sent: Wednesday, September 7, 2022 2:16 PM > > On Wed, Sep 07, 2022 at 04:12:47PM +0000, Parav Pandit wrote: > > > > > From: Michael S. Tsirkin <mst at redhat.com> > > > Sent: Wednesday, September 7, 2022 10:40 AM > > > > > > On Wed, Sep 07, 2022 at 02:33:02PM +0000, Parav Pandit wrote: > > > > > > > > > From: Michael S. Tsirkin <mst at redhat.com> > > > > > Sent: Wednesday, September 7, 2022 10:30 AM > > > > > > > > [..] > > > > > > > actually how does this waste space? Is this because your > > > > > > > device does not have INDIRECT? > > > > > > VQ is 256 entries deep. > > > > > > Driver posted total of 256 descriptors. > > > > > > Each descriptor points to a page of 4K. > > > > > > These descriptors are chained as 4K * 16. > > > > > > > > > > So without indirect then? with indirect each descriptor can > > > > > point to > > > > > 16 entries. > > > > > > > > > With indirect, can it post 256 * 16 size buffers even though vq > > > > depth is 256 > > > entries? > > > > I recall that total number of descriptors with direct/indirect > > > > descriptors is > > > limited to vq depth. > > > > > > > > > > Was there some recent clarification occurred in the spec to clarify this? > > > > > > > > > This would make INDIRECT completely pointless. So I don't think we > > > ever had such a limitation. > > > The only thing that comes to mind is this: > > > > > > A driver MUST NOT create a descriptor chain longer than the Queue > > > Size of > > > the device. > > > > > > but this limits individual chain length not the total length of all chains. > > > > > Right. > > I double checked in virtqueue_add_split() which doesn't count table > entries towards desc count of VQ for indirect case. > > > > With indirect descriptors without this patch the situation is even worse > with memory usage. > > Driver will allocate 64K * 256 = 16MB buffer per VQ, while needed (and > used) buffer is only 2.3 Mbytes. > > Yes. So just so we understand the reason for the performance improvement > is this because of memory usage? Or is this because device does not have > INDIRECT?Because of shallow queue of 16 entries deep. With driver turn around time to repost buffers, device is idle without any RQ buffers. With this improvement, device has 85 buffers instead of 16 to receive packets. Enabling indirect in device can help at cost of 7x higher memory per VQ in the guest VM.
Michael S. Tsirkin
2022-Sep-07 19:11 UTC
[PATCH v5 2/2] virtio-net: use mtu size as buffer length for big packets
On Wed, Sep 07, 2022 at 07:06:09PM +0000, Parav Pandit wrote:> > From: Michael S. Tsirkin <mst at redhat.com> > > Sent: Wednesday, September 7, 2022 2:16 PM > > > > On Wed, Sep 07, 2022 at 04:12:47PM +0000, Parav Pandit wrote: > > > > > > > From: Michael S. Tsirkin <mst at redhat.com> > > > > Sent: Wednesday, September 7, 2022 10:40 AM > > > > > > > > On Wed, Sep 07, 2022 at 02:33:02PM +0000, Parav Pandit wrote: > > > > > > > > > > > From: Michael S. Tsirkin <mst at redhat.com> > > > > > > Sent: Wednesday, September 7, 2022 10:30 AM > > > > > > > > > > [..] > > > > > > > > actually how does this waste space? Is this because your > > > > > > > > device does not have INDIRECT? > > > > > > > VQ is 256 entries deep. > > > > > > > Driver posted total of 256 descriptors. > > > > > > > Each descriptor points to a page of 4K. > > > > > > > These descriptors are chained as 4K * 16. > > > > > > > > > > > > So without indirect then? with indirect each descriptor can > > > > > > point to > > > > > > 16 entries. > > > > > > > > > > > With indirect, can it post 256 * 16 size buffers even though vq > > > > > depth is 256 > > > > entries? > > > > > I recall that total number of descriptors with direct/indirect > > > > > descriptors is > > > > limited to vq depth. > > > > > > > > > > > > > Was there some recent clarification occurred in the spec to clarify this? > > > > > > > > > > > > This would make INDIRECT completely pointless. So I don't think we > > > > ever had such a limitation. > > > > The only thing that comes to mind is this: > > > > > > > > A driver MUST NOT create a descriptor chain longer than the Queue > > > > Size of > > > > the device. > > > > > > > > but this limits individual chain length not the total length of all chains. > > > > > > > Right. > > > I double checked in virtqueue_add_split() which doesn't count table > > entries towards desc count of VQ for indirect case. > > > > > > With indirect descriptors without this patch the situation is even worse > > with memory usage. > > > Driver will allocate 64K * 256 = 16MB buffer per VQ, while needed (and > > used) buffer is only 2.3 Mbytes. > > > > Yes. So just so we understand the reason for the performance improvement > > is this because of memory usage? Or is this because device does not have > > INDIRECT? > > Because of shallow queue of 16 entries deep.but why is the queue just 16 entries? does the device not support indirect? because with indirect you get 256 entries, with 16 s/g each.> With driver turn around time to repost buffers, device is idle without any RQ buffers. > With this improvement, device has 85 buffers instead of 16 to receive packets. > > Enabling indirect in device can help at cost of 7x higher memory per VQ in the guest VM.