Rusty Russell
2008-Oct-09 00:55 UTC
[PATCH 2/2] virtio_net: Improve the recv buffer allocation scheme
On Thursday 09 October 2008 06:34:59 Mark McLoughlin wrote:> From: Herbert Xu <herbert.xu at redhat.com> > > If segmentation offload is enabled by the host, we currently allocate > maximum sized packet buffers and pass them to the host. This uses up > 20 ring entries, allowing us to supply only 20 packet buffers to the > host with a 256 entry ring. This is a huge overhead when receiving > small packets, and is most keenly felt when receiving MTU sized > packets from off-host.Hi Mark! There are three approaches we should investigate before adding YA feature. Obviously, we can simply increase the number of ring entries. Secondly, we can put the virtio_net_hdr at the head of the skb data (this is also worth considering for xmit I think if we have headroom) and drop MAX_SKB_FRAGS which contains a gratuitous +2. Thirdly, we can try to coalesce contiguous buffers. The page caching scheme we have might help here, I don't know. Maybe we should be explicitly trying to allocate higher orders. Now, that said, we might need this anyway. But let's try the easy things first? (Or as well...)> The size of the logical buffer is > returned to the guest rather than the size of the individual smaller > buffers.That's a virtio transport breakage: can you use the standard virtio mechanism, just put the extended length or number of extra buffers inside the virtio_net_hdr? That makes more sense to me.> Make use of this support by supplying single page receive buffers to > the host. On receive, we extract the virtio_net_hdr, copy 128 bytes of > the payload to the skb's linear data buffer and adjust the fragment > offset to point to the remaining data. This ensures proper alignment > and allows us to not use any paged data for small packets. If the > payload occupies multiple pages, we simply append those pages as > fragments and free the associated skbs.> + char *p = page_address(skb_shinfo(skb)->frags[0].page);...> + memcpy(hdr, p, sizeof(*hdr)); > + p += sizeof(*hdr);I think you need kmap_atomic() here to access the page. And yes, that will effect performance :( A few more comments moved from the patch header into the source wouldn't go astray, but I'm happy to do that myself (it's been on my TODO for a while). Thanks! Rusty.
Herbert Xu
2008-Oct-09 15:30 UTC
[PATCH 2/2] virtio_net: Improve the recv buffer allocation scheme
On Thu, Oct 09, 2008 at 11:55:59AM +1100, Rusty Russell wrote:> > There are three approaches we should investigate before adding YA feature. > Obviously, we can simply increase the number of ring entries.That's not going to work so well as you need to increase the ring size by MAX_SKB_FRAGS times to achieve the same level of effect. Basically the current scheme is either going to suck at non-TSO traffic or it's going to chew too much resources.> Secondly, we can put the virtio_net_hdr at the head of the skb data (this is > also worth considering for xmit I think if we have headroom) and drop > MAX_SKB_FRAGS which contains a gratuitous +2.That's fine but having skb->data in the ring still means two different kinds of memory in there and it sucks when you only have 1500-byte packets.> Thirdly, we can try to coalesce contiguous buffers. The page caching scheme > we have might help here, I don't know. Maybe we should be explicitly trying > to allocate higher orders.That's not really the key problem here. The problem here is that the scheme we're currently using in virtio-net is simply broken when it comes to 1500-byte sized packets. Most of the entries on the ring buffer go to waste. We need a scheme that handles both 1500-byte packets as well as 64K-byte size ones, and without holding down 16M of memory per guest.> > The size of the logical buffer is > > returned to the guest rather than the size of the individual smaller > > buffers. > > That's a virtio transport breakage: can you use the standard virtio mechanism, > just put the extended length or number of extra buffers inside the > virtio_net_hdr?Sure that sounds reasonable.> > Make use of this support by supplying single page receive buffers to > > the host. On receive, we extract the virtio_net_hdr, copy 128 bytes of > > the payload to the skb's linear data buffer and adjust the fragment > > offset to point to the remaining data. This ensures proper alignment > > and allows us to not use any paged data for small packets. If the > > payload occupies multiple pages, we simply append those pages as > > fragments and free the associated skbs. > > > + char *p = page_address(skb_shinfo(skb)->frags[0].page); > ... > > + memcpy(hdr, p, sizeof(*hdr)); > > + p += sizeof(*hdr); > > I think you need kmap_atomic() here to access the page. And yes, that will > effect performance :(No we don't. kmap would only be necessary for highmem which we did not request. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert at gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Chris Wright
2008-Oct-09 15:35 UTC
[PATCH 2/2] virtio_net: Improve the recv buffer allocation scheme
* Rusty Russell (rusty at rustcorp.com.au) wrote:> On Thursday 09 October 2008 06:34:59 Mark McLoughlin wrote: > > From: Herbert Xu <herbert.xu at redhat.com> > > > > If segmentation offload is enabled by the host, we currently allocate > > maximum sized packet buffers and pass them to the host. This uses up > > 20 ring entries, allowing us to supply only 20 packet buffers to the > > host with a 256 entry ring. This is a huge overhead when receiving > > small packets, and is most keenly felt when receiving MTU sized > > packets from off-host. > > There are three approaches we should investigate before adding YA feature. > Obviously, we can simply increase the number of ring entries.Tried that, it didn't help much. I don't have my numbers handy, but levelled off at about 512 and was a modest boost. It's still wasteful to preallocate like that on the off-chance it's a large packet. thanks, -chris