Michael Dalton
2014-Jan-09  03:41 UTC
[PATCH net-next v2 3/4] virtio-net: auto-tune mergeable rx buffer size for improved performance
Sorry, forgot to mention - if we want to explore combining the buffer address and truesize into a single void *, we could also exploit the fact that our size ranges from aligned GOOD_PACKET_LEN to PAGE_SIZE, and potentially encode fewer values for truesize (and require a smaller alignment than 256). The prior e-mails discussion of 256 byte alignment with 256 values is just one potential design point. Best, Mike
Michael S. Tsirkin
2014-Jan-09  06:48 UTC
[PATCH net-next v2 3/4] virtio-net: auto-tune mergeable rx buffer size for improved performance
On Wed, Jan 08, 2014 at 07:41:58PM -0800, Michael Dalton wrote:> Sorry, forgot to mention - if we want to explore combining the buffer > address and truesize into a single void *, we could also exploit the > fact that our size ranges from aligned GOOD_PACKET_LEN to PAGE_SIZE, and > potentially encode fewer values for truesize (and require a smaller > alignment than 256). The prior e-mails discussion of 256 byte alignment > with 256 values is just one potential design point. > > Best, > > MikeGood point. I think we should keep the option to make buffers bigger than 4K, so I think we should start with 256 alignment, then see if there are workloads that are improved by smaller alignment. Can you add wrapper inline functions to pack/unpack size and buffer pointer to/from void *? This way it will be easy to experiment with different alignments.
Michael Dalton
2014-Jan-09  08:28 UTC
[PATCH net-next v2 3/4] virtio-net: auto-tune mergeable rx buffer size for improved performance
Hi Michael,
Here's a quick sketch of some code that enforces a minimum buffer
alignment of only 64, and has a maximum theoretical buffer size of
aligned GOOD_PACKET_LEN + (BUF_ALIGN - 1) * BUF_ALIGN, which is at least
1536 + 63 * 64 = 5568. On x86, we already use a 64 byte alignment, and
this code supports all current buffer sizes, from 1536 to PAGE_SIZE.
#if L1_CACHE_BYTES < 64
#define MERGEABLE_BUFFER_ALIGN 64
#define MERGEABLE_BUFFER_SHIFT 6
#else
#define MERGEABLE_BUFFER_ALIGN L1_CACHE_BYTES
#define MERGEABLE_BUFFER_SHIFT L1_CACHE_SHIFT
#endif
#define MERGEABLE_BUFFER_MIN ALIGN(GOOD_PACKET_LEN +
                                   sizeof(virtio_net_hdr_mrg_rbuf),
                                   MERGEABLE_BUFFER_ALIGN)
#define MERGEABLE_BUFFER_MAX min(MERGEABLE_BUFFER_MIN +
                                 (MERGEABLE_BUFFER_ALIGN - 1) *
                                 MERGEABLE_BUFFER_ALIGN, PAGE_SIZE)
/* Extract buffer length from a mergeable buffer context. */
static u16 get_mergeable_buf_ctx_len(void *ctx) {
        u16 len = (uintptr_t)ctx & (MERGEABLE_BUFFER_ALIGN - 1);
        return MERGEABLE_BUFFER_MIN + (len << MERGEABLE_BUFFER_SHIFT);
}
/* Extract buffer base address from a mergeable buffer context. */
static void *get_mergeable_buf_ctx_base(void *ctx) {
        return (void *) ((uintptr)ctx & -MERGEABLE_BUFFER_ALIGN);
}
/* Convert a base address and length to a mergeable buffer context. */
static void *to_mergeable_buf_ctx(void *base, u16 len) {
        len -= MERGEABLE_BUFFER_MIN;
        return (void *) ((uintptr)base | (len >> MERGEABLE_BUFFER_SHIFT));
}
/* Compute the packet buffer length for a receive queue. */
static u16 get_mergeable_buffer_len(struct receive_queue *rq) {
        u16 len = clamp_t(u16, MERGEABLE_BUFFER_MIN,
                          ewma_read(&rq->avg_pkt_len),
                          MERGEABLE_BUFFER_MAX);
        return ALIGN(len, MERGEABLE_BUFFER_ALIGN);
}
Best,
Mike
Seemingly Similar Threads
- [PATCH net-next 4/4] virtio-net: auto-tune mergeable rx buffer size for improved performance
- [PATCH net-next 3/3] net: auto-tune mergeable rx buffer size for improved performance
- [PATCH net-next 4/4] virtio-net: auto-tune mergeable rx buffer size for improved performance
- [PATCH net-next 4/4] virtio-net: auto-tune mergeable rx buffer size for improved performance
- [PATCH net-next 4/4] virtio-net: auto-tune mergeable rx buffer size for improved performance