On Tue, 11 Jan 2022 10:46:41 -0500, Michael S. Tsirkin <mst at redhat.com>
wrote:> On Tue, Jan 11, 2022 at 08:40:56AM +0000, Karlsson, Magnus wrote:
> >
> >
> > > -----Original Message-----
> > > From: Jason Wang <jasowang at redhat.com>
> > > Sent: Tuesday, January 11, 2022 9:26 AM
> > > To: Xuan Zhuo <xuanzhuo at linux.alibaba.com>
> > > Cc: Karlsson, Magnus <magnus.karlsson at intel.com>;
virtualization
> > > <virtualization at lists.linux-foundation.org>; Michael
S.Tsirkin
> > > <mst at redhat.com>
> > > Subject: Re: RE: [PATCH 0/6] virtio: support advance DMA
> > >
> > > On Tue, Jan 11, 2022 at 4:17 PM Xuan Zhuo <xuanzhuo at
linux.alibaba.com>
> > > wrote:
> > > >
> > > > On Tue, 11 Jan 2022 08:04:05 +0000, Karlsson, Magnus
> > > <magnus.karlsson at intel.com> wrote:
> > > > >
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: Xuan Zhuo <xuanzhuo at
linux.alibaba.com>
> > > > > > Sent: Tuesday, January 11, 2022 7:17 AM
> > > > > > To: Jason Wang <jasowang at redhat.com>
> > > > > > Cc: virtualization <virtualization at
lists.linux-foundation.org>;
> > > > > > Michael S.Tsirkin <mst at redhat.com>;
Karlsson, Magnus
> > > > > > <magnus.karlsson at intel.com>
> > > > > > Subject: Re: [PATCH 0/6] virtio: support advance
DMA
> > > > > >
> > > > > > On Tue, 11 Jan 2022 10:54:45 +0800, Jason Wang
> > > > > > <jasowang at redhat.com>
> > > > > > wrote:
> > > > > > > On Mon, Jan 10, 2022 at 5:59 PM Michael S.
Tsirkin
> > > > > > > <mst at redhat.com>
> > > > > > wrote:
> > > > > > > >
> > > > > > > > On Fri, Jan 07, 2022 at 02:33:00PM
+0800, Xuan Zhuo wrote:
> > > > > > > > > virtqueue_add() only supports
virtual addresses, dma is
> > > > > > > > > completed in virtqueue_add().
> > > > > > > > >
> > > > > > > > > In some scenarios (such as the
AF_XDP scenario), DMA is
> > > > > > > > > completed in advance, so it is
necessary for us to support
> > > > > > > > > passing the DMA address
> > > > > > to virtqueue_add().
> > > > > > > > >
> > > > > > > > > This patch set stipulates that if
sg->dma_address is not
> > > > > > > > > NULL, use this address as the DMA
address. And record this
> > > > > > > > > information in
> > > > > > > > > extra->flags, which can be
skipped when executing dma unmap.
> > > > > > > > >
> > > > > > > > > extra->flags |=
VRING_DESC_F_PREDMA;
> > > > > > > > >
> > > > > > > > > But the indirect desc does not have
a corresponding extra,
> > > > > > > > > so the second and third patches of
this patch set are to
> > > > > > > > > allocate the corresponding extra
while allocating the
> > > > > > > > > indirect desc. Each desc must have
a corresponding extra
> > > > > > > > > because it is possible in an sgs
some are advance DMA, while
> > > > > > > > > others are virtual addresses. So we
must
> > > > > > allocate an extra for each indirect desc.
> > > > > > > >
> > > > > > > >
> > > > > > > > I didn't realize AF_XDP didn't
have space to stuff the header into.
> > > > > > > > Jason, is that expected?
> > > > > > >
> > > > > > > I might be wrong but it looks to me AF_XDP
allows to reserve
> > > > > > > sufficient headroom via xdp_umem_reg_v1.
> > > > > > >
> > > > > >
> > > > > > I understand that there is a headroom for
receiving packages,
> > > > > > which can be used to put virtio headers. But there
is no headroom
> > > > > > defined in the direction of sending packages. I
hope Magnus
> > > > > > Karlsson can help confirm whether there is any
misunderstanding.
> > > > >
> > > > > You can specify the amount of headroom you want on Tx
by adjusting the
> > > "addr" field in the descriptor of the Tx ring. If your
chunk starts at address X
> > > in the umem and you want 128 bytes of headroom, just write your
packet
> > > into X+128 and put that address into the Tx descriptor. Will this
solve your
> > > problem? If not, what would you need from AF_XDP to make it work?
> > > > >
> > > > > On Rx, there is always 256 bytes worth of headroom as
specified by XDP.
> > > If you need extra, you can set the headroom variable when you
register the
> > > umem.
> > > >
> > > > The driver of virtio net, when passing the packet to the
hardware,
> > > > should add a virtnet hdr (12 bytes) in front of the packet.
Both rx
> > > > and tx should add such a header. AF_XDP has a space of 256
bytes in
> > > > the rx process. We can reuse this space. The direction of
AF_XDP tx has no
> > > such regulation.
> > > >
> > > > The method you mentioned requires user cooperation, which is
not a
> > > > good method for driver implementation.
> > >
> > > This will result in a non-portable userspace program. I wonder
why TX has
> > > become a problem here actually, anyhow we can use a dedicated sg
for vnet
> > > hdr? And if we packed all vnet headers in an array it will give
less
> > > performance impact.
> >
> > Yes, it would be great if you could put this somewhere else ?.
Especially since I do not see a way out of the problem of the driver requiring
headroom on Tx, since this is completely left up to user-space and user-space
packet offset == kernel-space packet offset within the chunk, always. The only
way to change that is through copying, but then it is not zero-copy anymore. One
could wish there was a mechanism of communicating this requirement to
user-space, but introducing that now would make any existing app not work for
virtio-net. So probably easier if you could come up with a solution not
requiring headroom for the driver. If not, please let me know.
>
> We can put the headroom in a separate buffer with a gather list. Gather
> list will always be slower than making it linear though. virtio is
> likely not unique in that it works better with a headroom.
>
> So it could be an optional thing where we handle a gather list too,
> let's say it's not a requirement but a performance hint.
> XDP generally is not too stingy with memory, so maybe just
> ask apps to preferably add a couple of hundred bytes headspace
> if they can, and then it's portable across drivers.
> And virtio will legacy where that space is not there.
> How does this sound?
I think the same way, if the application leaves us enough space, then we can use
this space to place the vnet hdr. If not, use a separate buffer.
The problem now is that AF_XDP needs a general mechanism to inform the driver
that there is a space in front of the packet.
This actually led me to some other thoughts, AF_XDP should pass more information
to the driver, such as using csum and other hardware offload.
When xdp_desc.options has a specified flag, addr points to not packet, but a
struct
struct {
int offset; // packet offset with addr
int csum_start;
int csum_offset;
.......
}
@Karlsson, Magnus
Thanks.
>
>
>
> > > Thanks
> > >
> > > >
> > > > Thanks.
> > > >
> > > > >
> > > > > > It would be best if we could not use indirect.
> > > > > >
> > > > > > Thanks.
> > > > > >
> > > > > > > > It would be best to fix that,
performance is best if header is
> > > > > > > > linear with the data ...
> > > > > > >
> > > > > > > This looks like a must otherwise we may meet
trouble in zerocopy
> > > receive.
> > > > > > >
> > > > > > > Thanks
> > > > > > >
> > > > > > > > Or maybe we can reduce the use of
indirect somewhat, at least
> > > > > > > > while the ring is mostly empty?
> > > > > > > >
> > > > > > > > > Xuan Zhuo (6):
> > > > > > > > > virtio: rename
vring_unmap_state_packed() to
> > > > > > > > > vring_unmap_extra_packed()
> > > > > > > > > virtio: split: alloc indirect
desc with extra
> > > > > > > > > virtio: packed: alloc indirect
desc with extra
> > > > > > > > > virtio: split:
virtqueue_add_split() support dma address
> > > > > > > > > virtio: packed:
virtqueue_add_packed() support dma address
> > > > > > > > > virtio: add api virtio_dma_map()
for advance dma
> > > > > > > > >
> > > > > > > > > drivers/virtio/virtio_ring.c | 387
++++++++++++++++++++---------
> > > ------
> > > > > > > > > include/linux/virtio.h | 9
+
> > > > > > > > > 2 files changed, 232
insertions(+), 164 deletions(-)
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > 2.31.0
> > > > > > > >
> > > > > > >
> > > >
> >
>