Stefano Garzarella
2019-Jul-30 09:35 UTC
[PATCH v4 1/5] vsock/virtio: limit the memory used per-socket
On Mon, Jul 29, 2019 at 03:10:15PM -0400, Michael S. Tsirkin wrote:> On Mon, Jul 29, 2019 at 06:50:56PM +0200, Stefano Garzarella wrote: > > On Mon, Jul 29, 2019 at 06:19:03PM +0200, Stefano Garzarella wrote: > > > On Mon, Jul 29, 2019 at 11:49:02AM -0400, Michael S. Tsirkin wrote: > > > > On Mon, Jul 29, 2019 at 05:36:56PM +0200, Stefano Garzarella wrote: > > > > > On Mon, Jul 29, 2019 at 10:04:29AM -0400, Michael S. Tsirkin wrote: > > > > > > On Wed, Jul 17, 2019 at 01:30:26PM +0200, Stefano Garzarella wrote: > > > > > > > Since virtio-vsock was introduced, the buffers filled by the host > > > > > > > and pushed to the guest using the vring, are directly queued in > > > > > > > a per-socket list. These buffers are preallocated by the guest > > > > > > > with a fixed size (4 KB). > > > > > > > > > > > > > > The maximum amount of memory used by each socket should be > > > > > > > controlled by the credit mechanism. > > > > > > > The default credit available per-socket is 256 KB, but if we use > > > > > > > only 1 byte per packet, the guest can queue up to 262144 of 4 KB > > > > > > > buffers, using up to 1 GB of memory per-socket. In addition, the > > > > > > > guest will continue to fill the vring with new 4 KB free buffers > > > > > > > to avoid starvation of other sockets. > > > > > > > > > > > > > > This patch mitigates this issue copying the payload of small > > > > > > > packets (< 128 bytes) into the buffer of last packet queued, in > > > > > > > order to avoid wasting memory. > > > > > > > > > > > > > > Reviewed-by: Stefan Hajnoczi <stefanha at redhat.com> > > > > > > > Signed-off-by: Stefano Garzarella <sgarzare at redhat.com> > > > > > > > > > > > > This is good enough for net-next, but for net I think we > > > > > > should figure out how to address the issue completely. > > > > > > Can we make the accounting precise? What happens to > > > > > > performance if we do? > > > > > > > > > > > > > > > > In order to do more precise accounting maybe we can use the buffer size, > > > > > instead of payload size when we update the credit available. > > > > > In this way, the credit available for each socket will reflect the memory > > > > > actually used. > > > > > > > > > > I should check better, because I'm not sure what happen if the peer sees > > > > > 1KB of space available, then it sends 1KB of payload (using a 4KB > > > > > buffer). > > > > > > > > > > The other option is to copy each packet in a new buffer like I did in > > > > > the v2 [2], but this forces us to make a copy for each packet that does > > > > > not fill the entire buffer, perhaps too expensive. > > > > > > > > > > [2] https://patchwork.kernel.org/patch/10938741/ > > > > > > > > > > > > > > > Thanks, > > > > > Stefano > > > > > > > > Interesting. You are right, and at some level the protocol forces copies. > > > > > > > > We could try to detect that the actual memory is getting close to > > > > admin limits and force copies on queued packets after the fact. > > > > Is that practical? > > > > > > Yes, I think it is doable! > > > We can decrease the credit available with the buffer size queued, and > > > when the buffer size of packet to queue is bigger than the credit > > > available, we can copy it. > > > > > > > > > > > And yes we can extend the credit accounting to include buffer size. > > > > That's a protocol change but maybe it makes sense. > > > > > > Since we send to the other peer the credit available, maybe this > > > change can be backwards compatible (I'll check better this). > > > > What I said was wrong. > > > > We send a counter (increased when the user consumes the packets) and the > > "buf_alloc" (the max memory allowed) to the other peer. > > It makes a difference between a local counter (increased when the > > packets are sent) and the remote counter to calculate the credit available: > > > > u32 virtio_transport_get_credit(struct virtio_vsock_sock *vvs, u32 credit) > > { > > u32 ret; > > > > spin_lock_bh(&vvs->tx_lock); > > ret = vvs->peer_buf_alloc - (vvs->tx_cnt - vvs->peer_fwd_cnt); > > if (ret > credit) > > ret = credit; > > vvs->tx_cnt += ret; > > spin_unlock_bh(&vvs->tx_lock); > > > > return ret; > > } > > > > Maybe I can play with "buf_alloc" to take care of bytes queued but not > > used. > > > > Thanks, > > Stefano > > Right. And the idea behind it all was that if we send a credit > to remote then we have space for it.Yes.> I think the basic idea was that if we have actual allocated > memory and can copy data there, then we send the credit to > remote. > > Of course that means an extra copy every packet. > So as an optimization, it seems that we just assume > that we will be able to allocate a new buffer.Yes, we refill the virtqueue when half of the buffers were used.> > First this is not the best we can do. We can actually do > allocate memory in the socket before sending credit.In this case, IIUC we should allocate an entire buffer (4KB), so we can reuse it if the packet is big.> If packet is small then we copy it there. > If packet is big then we queue the packet, > take the buffer out of socket and add it to the virtqueue. > > Second question is what to do about medium sized packets. > Packet is 1K but buffer is 4K, what do we do? > And here I wonder - why don't we add the 3K buffer > to the vq?This would allow us to have an accurate credit account. The problem here is the compatibility. Before this series virtio-vsock and vhost-vsock modules had the RX buffer size hard-coded (VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE = 4K). So, if we send a buffer smaller of 4K, there might be issues. Maybe it is the time to add add 'features' to virtio-vsock device. Thanks, Stefano
Michael S. Tsirkin
2019-Jul-30 20:42 UTC
[PATCH v4 1/5] vsock/virtio: limit the memory used per-socket
On Tue, Jul 30, 2019 at 11:35:39AM +0200, Stefano Garzarella wrote:> On Mon, Jul 29, 2019 at 03:10:15PM -0400, Michael S. Tsirkin wrote: > > On Mon, Jul 29, 2019 at 06:50:56PM +0200, Stefano Garzarella wrote: > > > On Mon, Jul 29, 2019 at 06:19:03PM +0200, Stefano Garzarella wrote: > > > > On Mon, Jul 29, 2019 at 11:49:02AM -0400, Michael S. Tsirkin wrote: > > > > > On Mon, Jul 29, 2019 at 05:36:56PM +0200, Stefano Garzarella wrote: > > > > > > On Mon, Jul 29, 2019 at 10:04:29AM -0400, Michael S. Tsirkin wrote: > > > > > > > On Wed, Jul 17, 2019 at 01:30:26PM +0200, Stefano Garzarella wrote: > > > > > > > > Since virtio-vsock was introduced, the buffers filled by the host > > > > > > > > and pushed to the guest using the vring, are directly queued in > > > > > > > > a per-socket list. These buffers are preallocated by the guest > > > > > > > > with a fixed size (4 KB). > > > > > > > > > > > > > > > > The maximum amount of memory used by each socket should be > > > > > > > > controlled by the credit mechanism. > > > > > > > > The default credit available per-socket is 256 KB, but if we use > > > > > > > > only 1 byte per packet, the guest can queue up to 262144 of 4 KB > > > > > > > > buffers, using up to 1 GB of memory per-socket. In addition, the > > > > > > > > guest will continue to fill the vring with new 4 KB free buffers > > > > > > > > to avoid starvation of other sockets. > > > > > > > > > > > > > > > > This patch mitigates this issue copying the payload of small > > > > > > > > packets (< 128 bytes) into the buffer of last packet queued, in > > > > > > > > order to avoid wasting memory. > > > > > > > > > > > > > > > > Reviewed-by: Stefan Hajnoczi <stefanha at redhat.com> > > > > > > > > Signed-off-by: Stefano Garzarella <sgarzare at redhat.com> > > > > > > > > > > > > > > This is good enough for net-next, but for net I think we > > > > > > > should figure out how to address the issue completely. > > > > > > > Can we make the accounting precise? What happens to > > > > > > > performance if we do? > > > > > > > > > > > > > > > > > > > In order to do more precise accounting maybe we can use the buffer size, > > > > > > instead of payload size when we update the credit available. > > > > > > In this way, the credit available for each socket will reflect the memory > > > > > > actually used. > > > > > > > > > > > > I should check better, because I'm not sure what happen if the peer sees > > > > > > 1KB of space available, then it sends 1KB of payload (using a 4KB > > > > > > buffer). > > > > > > > > > > > > The other option is to copy each packet in a new buffer like I did in > > > > > > the v2 [2], but this forces us to make a copy for each packet that does > > > > > > not fill the entire buffer, perhaps too expensive. > > > > > > > > > > > > [2] https://patchwork.kernel.org/patch/10938741/ > > > > > > > > > > > > > > > > > > Thanks, > > > > > > Stefano > > > > > > > > > > Interesting. You are right, and at some level the protocol forces copies. > > > > > > > > > > We could try to detect that the actual memory is getting close to > > > > > admin limits and force copies on queued packets after the fact. > > > > > Is that practical? > > > > > > > > Yes, I think it is doable! > > > > We can decrease the credit available with the buffer size queued, and > > > > when the buffer size of packet to queue is bigger than the credit > > > > available, we can copy it. > > > > > > > > > > > > > > And yes we can extend the credit accounting to include buffer size. > > > > > That's a protocol change but maybe it makes sense. > > > > > > > > Since we send to the other peer the credit available, maybe this > > > > change can be backwards compatible (I'll check better this). > > > > > > What I said was wrong. > > > > > > We send a counter (increased when the user consumes the packets) and the > > > "buf_alloc" (the max memory allowed) to the other peer. > > > It makes a difference between a local counter (increased when the > > > packets are sent) and the remote counter to calculate the credit available: > > > > > > u32 virtio_transport_get_credit(struct virtio_vsock_sock *vvs, u32 credit) > > > { > > > u32 ret; > > > > > > spin_lock_bh(&vvs->tx_lock); > > > ret = vvs->peer_buf_alloc - (vvs->tx_cnt - vvs->peer_fwd_cnt); > > > if (ret > credit) > > > ret = credit; > > > vvs->tx_cnt += ret; > > > spin_unlock_bh(&vvs->tx_lock); > > > > > > return ret; > > > } > > > > > > Maybe I can play with "buf_alloc" to take care of bytes queued but not > > > used. > > > > > > Thanks, > > > Stefano > > > > Right. And the idea behind it all was that if we send a credit > > to remote then we have space for it. > > Yes. > > > I think the basic idea was that if we have actual allocated > > memory and can copy data there, then we send the credit to > > remote. > > > > Of course that means an extra copy every packet. > > So as an optimization, it seems that we just assume > > that we will be able to allocate a new buffer. > > Yes, we refill the virtqueue when half of the buffers were used. > > > > > First this is not the best we can do. We can actually do > > allocate memory in the socket before sending credit. > > In this case, IIUC we should allocate an entire buffer (4KB), > so we can reuse it if the packet is big. > > > If packet is small then we copy it there. > > If packet is big then we queue the packet, > > take the buffer out of socket and add it to the virtqueue. > > > > Second question is what to do about medium sized packets. > > Packet is 1K but buffer is 4K, what do we do? > > And here I wonder - why don't we add the 3K buffer > > to the vq? > > This would allow us to have an accurate credit account. > > The problem here is the compatibility. Before this series virtio-vsock > and vhost-vsock modules had the RX buffer size hard-coded > (VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE = 4K). So, if we send a buffer smaller > of 4K, there might be issues.Shouldn't be if they are following the spec. If not let's fix the broken parts.> > Maybe it is the time to add add 'features' to virtio-vsock device. > > Thanks, > StefanoWhy would a remote care about buffer sizes? Let's first see what the issues are. If they exist we can either fix the bugs, or code the bug as a feature in spec. -- MST
Stefano Garzarella
2019-Aug-01 10:47 UTC
[PATCH v4 1/5] vsock/virtio: limit the memory used per-socket
On Tue, Jul 30, 2019 at 04:42:25PM -0400, Michael S. Tsirkin wrote:> On Tue, Jul 30, 2019 at 11:35:39AM +0200, Stefano Garzarella wrote:(...)> > > > The problem here is the compatibility. Before this series virtio-vsock > > and vhost-vsock modules had the RX buffer size hard-coded > > (VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE = 4K). So, if we send a buffer smaller > > of 4K, there might be issues. > > Shouldn't be if they are following the spec. If not let's fix > the broken parts. > > > > > Maybe it is the time to add add 'features' to virtio-vsock device. > > > > Thanks, > > Stefano > > Why would a remote care about buffer sizes? > > Let's first see what the issues are. If they exist > we can either fix the bugs, or code the bug as a feature in spec. >The vhost_transport '.stream_enqueue' callback [virtio_transport_stream_enqueue()] calls the virtio_transport_send_pkt_info(), passing the user message. This function allocates a new packet, copying the user message, but (before this series) it limits the packet size to the VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE (4K): static int virtio_transport_send_pkt_info(struct vsock_sock *vsk, struct virtio_vsock_pkt_info *info) { ... /* we can send less than pkt_len bytes */ if (pkt_len > VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE) pkt_len = VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE; /* virtio_transport_get_credit might return less than pkt_len credit */ pkt_len = virtio_transport_get_credit(vvs, pkt_len); /* Do not send zero length OP_RW pkt */ if (pkt_len == 0 && info->op == VIRTIO_VSOCK_OP_RW) return pkt_len; ... } then it queues the packet for the TX worker calling .send_pkt() [vhost_transport_send_pkt() in the vhost_transport case] The main function executed by the TX worker is vhost_transport_do_send_pkt() that picks up a buffer from the virtqueue and it tries to copy the packet (up to 4K) on it. If the buffer allocated from the guest will be smaller then 4K, I think here it will be discarded with an error: static void vhost_transport_do_send_pkt(struct vhost_vsock *vsock, struct vhost_virtqueue *vq) { ... nbytes = copy_to_iter(pkt->buf, pkt->len, &iov_iter); if (nbytes != pkt->len) { virtio_transport_free_pkt(pkt); vq_err(vq, "Faulted on copying pkt buf\n"); break; } ... } This series changes this behavior since now we will split the packet in vhost_transport_do_send_pkt() depending on the buffer found in the virtqueue. We didn't change the buffer size in this series, so we still backward compatible, but if we will use buffers smaller than 4K, we should encounter the error described above. How do you suggest we proceed if we want to change the buffer size? Maybe adding a feature to "support any buffer size"? Thanks, Stefano
Reasonably Related Threads
- [PATCH v4 1/5] vsock/virtio: limit the memory used per-socket
- [PATCH v4 1/5] vsock/virtio: limit the memory used per-socket
- [PATCH v4 1/5] vsock/virtio: limit the memory used per-socket
- [PATCH v4 1/5] vsock/virtio: limit the memory used per-socket
- [PATCH v4 1/5] vsock/virtio: limit the memory used per-socket