Stefano Garzarella
2019-Jul-17  11:30 UTC
[PATCH v4 0/5] vsock/virtio: optimizations to increase the throughput
This series tries to increase the throughput of virtio-vsock with slight
changes.
While I was testing the v2 of this series I discovered an huge use of memory,
so I added patch 1 to mitigate this issue. I put it in this series in order
to better track the performance trends.
v4:
- rebased all patches on current master (conflicts is Patch 4)
- Patch 1: added Stefan's R-b
- Patch 3: removed lock when buf_alloc is written [David];
           moved this patch after "vsock/virtio: reduce credit update
messages"
           to make it clearer
- Patch 4: vhost_exceeds_weight() is recently introduced, so I've solved
some
           conflicts
v3: https://patchwork.kernel.org/cover/10970145
v2: https://patchwork.kernel.org/cover/10938743
v1: https://patchwork.kernel.org/cover/10885431
Below are the benchmarks step by step. I used iperf3 [1] modified with VSOCK
support. As Micheal suggested in the v1, I booted host and guest with
'nosmap'.
A brief description of patches:
- Patches 1:   limit the memory usage with an extra copy for small packets
- Patches 2+3: reduce the number of credit update messages sent to the
               transmitter
- Patches 4+5: allow the host to split packets on multiple buffers and use
               VIRTIO_VSOCK_MAX_PKT_BUF_SIZE as the max packet size allowed
                    host -> guest [Gbps]
pkt_size before opt   p 1     p 2+3    p 4+5
32         0.032     0.030    0.048    0.051
64         0.061     0.059    0.108    0.117
128        0.122     0.112    0.227    0.234
256        0.244     0.241    0.418    0.415
512        0.459     0.466    0.847    0.865
1K         0.927     0.919    1.657    1.641
2K         1.884     1.813    3.262    3.269
4K         3.378     3.326    6.044    6.195
8K         5.637     5.676   10.141   11.287
16K        8.250     8.402   15.976   16.736
32K       13.327    13.204   19.013   20.515
64K       21.241    21.341   20.973   21.879
128K      21.851    22.354   21.816   23.203
256K      21.408    21.693   21.846   24.088
512K      21.600    21.899   21.921   24.106
                    guest -> host [Gbps]
pkt_size before opt   p 1     p 2+3    p 4+5
32         0.045     0.046    0.057    0.057
64         0.089     0.091    0.103    0.104
128        0.170     0.179    0.192    0.200
256        0.364     0.351    0.361    0.379
512        0.709     0.699    0.731    0.790
1K         1.399     1.407    1.395    1.427
2K         2.670     2.684    2.745    2.835
4K         5.171     5.199    5.305    5.451
8K         8.442     8.500   10.083    9.941
16K       12.305    12.259   13.519   15.385
32K       11.418    11.150   11.988   24.680
64K       10.778    10.659   11.589   35.273
128K      10.421    10.339   10.939   40.338
256K      10.300     9.719   10.508   36.562
512K       9.833     9.808   10.612   35.979
As Stefan suggested in the v1, I measured also the efficiency in this way:
    efficiency = Mbps / (%CPU_Host + %CPU_Guest)
The '%CPU_Guest' is taken inside the VM. I know that it is not the best
way,
but it's provided for free from iperf3 and could be an indication.
        host -> guest efficiency [Mbps / (%CPU_Host + %CPU_Guest)]
pkt_size before opt   p 1     p 2+3    p 4+5
32         0.35      0.45     0.79     1.02
64         0.56      0.80     1.41     1.54
128        1.11      1.52     3.03     3.12
256        2.20      2.16     5.44     5.58
512        4.17      4.18    10.96    11.46
1K         8.30      8.26    20.99    20.89
2K        16.82     16.31    39.76    39.73
4K        30.89     30.79    74.07    75.73
8K        53.74     54.49   124.24   148.91
16K       80.68     83.63   200.21   232.79
32K      132.27    132.52   260.81   357.07
64K      229.82    230.40   300.19   444.18
128K     332.60    329.78   331.51   492.28
256K     331.06    337.22   339.59   511.59
512K     335.58    328.50   331.56   504.56
        guest -> host efficiency [Mbps / (%CPU_Host + %CPU_Guest)]
pkt_size before opt   p 1     p 2+3    p 4+5
32         0.43      0.43     0.53     0.56
64         0.85      0.86     1.04     1.10
128        1.63      1.71     2.07     2.13
256        3.48      3.35     4.02     4.22
512        6.80      6.67     7.97     8.63
1K        13.32     13.31    15.72    15.94
2K        25.79     25.92    30.84    30.98
4K        50.37     50.48    58.79    59.69
8K        95.90     96.15   107.04   110.33
16K      145.80    145.43   143.97   174.70
32K      147.06    144.74   146.02   282.48
64K      145.25    143.99   141.62   406.40
128K     149.34    146.96   147.49   489.34
256K     156.35    149.81   152.21   536.37
512K     151.65    150.74   151.52   519.93
[1] https://github.com/stefano-garzarella/iperf/
Stefano Garzarella (5):
  vsock/virtio: limit the memory used per-socket
  vsock/virtio: reduce credit update messages
  vsock/virtio: fix locking in virtio_transport_inc_tx_pkt()
  vhost/vsock: split packets to send using multiple buffers
  vsock/virtio: change the maximum packet size allowed
 drivers/vhost/vsock.c                   | 68 ++++++++++++-----
 include/linux/virtio_vsock.h            |  4 +-
 net/vmw_vsock/virtio_transport.c        |  1 +
 net/vmw_vsock/virtio_transport_common.c | 99 ++++++++++++++++++++-----
 4 files changed, 134 insertions(+), 38 deletions(-)
-- 
2.20.1
Stefano Garzarella
2019-Jul-17  11:30 UTC
[PATCH v4 1/5] vsock/virtio: limit the memory used per-socket
Since virtio-vsock was introduced, the buffers filled by the host
and pushed to the guest using the vring, are directly queued in
a per-socket list. These buffers are preallocated by the guest
with a fixed size (4 KB).
The maximum amount of memory used by each socket should be
controlled by the credit mechanism.
The default credit available per-socket is 256 KB, but if we use
only 1 byte per packet, the guest can queue up to 262144 of 4 KB
buffers, using up to 1 GB of memory per-socket. In addition, the
guest will continue to fill the vring with new 4 KB free buffers
to avoid starvation of other sockets.
This patch mitigates this issue copying the payload of small
packets (< 128 bytes) into the buffer of last packet queued, in
order to avoid wasting memory.
Reviewed-by: Stefan Hajnoczi <stefanha at redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare at redhat.com>
---
 drivers/vhost/vsock.c                   |  2 +
 include/linux/virtio_vsock.h            |  1 +
 net/vmw_vsock/virtio_transport.c        |  1 +
 net/vmw_vsock/virtio_transport_common.c | 60 +++++++++++++++++++++----
 4 files changed, 55 insertions(+), 9 deletions(-)
diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
index 6a50e1d0529c..6c8390a2af52 100644
--- a/drivers/vhost/vsock.c
+++ b/drivers/vhost/vsock.c
@@ -329,6 +329,8 @@ vhost_vsock_alloc_pkt(struct vhost_virtqueue *vq,
 		return NULL;
 	}
 
+	pkt->buf_len = pkt->len;
+
 	nbytes = copy_from_iter(pkt->buf, pkt->len, &iov_iter);
 	if (nbytes != pkt->len) {
 		vq_err(vq, "Expected %u byte payload, got %zu bytes\n",
diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
index e223e2632edd..7d973903f52e 100644
--- a/include/linux/virtio_vsock.h
+++ b/include/linux/virtio_vsock.h
@@ -52,6 +52,7 @@ struct virtio_vsock_pkt {
 	/* socket refcnt not held, only use for cancellation */
 	struct vsock_sock *vsk;
 	void *buf;
+	u32 buf_len;
 	u32 len;
 	u32 off;
 	bool reply;
diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
index 0815d1357861..082a30936690 100644
--- a/net/vmw_vsock/virtio_transport.c
+++ b/net/vmw_vsock/virtio_transport.c
@@ -307,6 +307,7 @@ static void virtio_vsock_rx_fill(struct virtio_vsock *vsock)
 			break;
 		}
 
+		pkt->buf_len = buf_len;
 		pkt->len = buf_len;
 
 		sg_init_one(&hdr, &pkt->hdr, sizeof(pkt->hdr));
diff --git a/net/vmw_vsock/virtio_transport_common.c
b/net/vmw_vsock/virtio_transport_common.c
index 6f1a8aff65c5..095221f94786 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -26,6 +26,9 @@
 /* How long to wait for graceful shutdown of a connection */
 #define VSOCK_CLOSE_TIMEOUT (8 * HZ)
 
+/* Threshold for detecting small packets to copy */
+#define GOOD_COPY_LEN  128
+
 static const struct virtio_transport *virtio_transport_get_ops(void)
 {
 	const struct vsock_transport *t = vsock_core_get_transport();
@@ -64,6 +67,9 @@ virtio_transport_alloc_pkt(struct virtio_vsock_pkt_info *info,
 		pkt->buf = kmalloc(len, GFP_KERNEL);
 		if (!pkt->buf)
 			goto out_pkt;
+
+		pkt->buf_len = len;
+
 		err = memcpy_from_msg(pkt->buf, info->msg, len);
 		if (err)
 			goto out;
@@ -841,24 +847,60 @@ virtio_transport_recv_connecting(struct sock *sk,
 	return err;
 }
 
+static void
+virtio_transport_recv_enqueue(struct vsock_sock *vsk,
+			      struct virtio_vsock_pkt *pkt)
+{
+	struct virtio_vsock_sock *vvs = vsk->trans;
+	bool free_pkt = false;
+
+	pkt->len = le32_to_cpu(pkt->hdr.len);
+	pkt->off = 0;
+
+	spin_lock_bh(&vvs->rx_lock);
+
+	virtio_transport_inc_rx_pkt(vvs, pkt);
+
+	/* Try to copy small packets into the buffer of last packet queued,
+	 * to avoid wasting memory queueing the entire buffer with a small
+	 * payload.
+	 */
+	if (pkt->len <= GOOD_COPY_LEN &&
!list_empty(&vvs->rx_queue)) {
+		struct virtio_vsock_pkt *last_pkt;
+
+		last_pkt = list_last_entry(&vvs->rx_queue,
+					   struct virtio_vsock_pkt, list);
+
+		/* If there is space in the last packet queued, we copy the
+		 * new packet in its buffer.
+		 */
+		if (pkt->len <= last_pkt->buf_len - last_pkt->len) {
+			memcpy(last_pkt->buf + last_pkt->len, pkt->buf,
+			       pkt->len);
+			last_pkt->len += pkt->len;
+			free_pkt = true;
+			goto out;
+		}
+	}
+
+	list_add_tail(&pkt->list, &vvs->rx_queue);
+
+out:
+	spin_unlock_bh(&vvs->rx_lock);
+	if (free_pkt)
+		virtio_transport_free_pkt(pkt);
+}
+
 static int
 virtio_transport_recv_connected(struct sock *sk,
 				struct virtio_vsock_pkt *pkt)
 {
 	struct vsock_sock *vsk = vsock_sk(sk);
-	struct virtio_vsock_sock *vvs = vsk->trans;
 	int err = 0;
 
 	switch (le16_to_cpu(pkt->hdr.op)) {
 	case VIRTIO_VSOCK_OP_RW:
-		pkt->len = le32_to_cpu(pkt->hdr.len);
-		pkt->off = 0;
-
-		spin_lock_bh(&vvs->rx_lock);
-		virtio_transport_inc_rx_pkt(vvs, pkt);
-		list_add_tail(&pkt->list, &vvs->rx_queue);
-		spin_unlock_bh(&vvs->rx_lock);
-
+		virtio_transport_recv_enqueue(vsk, pkt);
 		sk->sk_data_ready(sk);
 		return err;
 	case VIRTIO_VSOCK_OP_CREDIT_UPDATE:
-- 
2.20.1
Stefano Garzarella
2019-Jul-17  11:30 UTC
[PATCH v4 2/5] vsock/virtio: reduce credit update messages
In order to reduce the number of credit update messages,
we send them only when the space available seen by the
transmitter is less than VIRTIO_VSOCK_MAX_PKT_BUF_SIZE.
Signed-off-by: Stefano Garzarella <sgarzare at redhat.com>
---
 include/linux/virtio_vsock.h            |  1 +
 net/vmw_vsock/virtio_transport_common.c | 16 +++++++++++++---
 2 files changed, 14 insertions(+), 3 deletions(-)
diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
index 7d973903f52e..49fc9d20bc43 100644
--- a/include/linux/virtio_vsock.h
+++ b/include/linux/virtio_vsock.h
@@ -41,6 +41,7 @@ struct virtio_vsock_sock {
 
 	/* Protected by rx_lock */
 	u32 fwd_cnt;
+	u32 last_fwd_cnt;
 	u32 rx_bytes;
 	struct list_head rx_queue;
 };
diff --git a/net/vmw_vsock/virtio_transport_common.c
b/net/vmw_vsock/virtio_transport_common.c
index 095221f94786..a85559d4d974 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -211,6 +211,7 @@ static void virtio_transport_dec_rx_pkt(struct
virtio_vsock_sock *vvs,
 void virtio_transport_inc_tx_pkt(struct virtio_vsock_sock *vvs, struct
virtio_vsock_pkt *pkt)
 {
 	spin_lock_bh(&vvs->tx_lock);
+	vvs->last_fwd_cnt = vvs->fwd_cnt;
 	pkt->hdr.fwd_cnt = cpu_to_le32(vvs->fwd_cnt);
 	pkt->hdr.buf_alloc = cpu_to_le32(vvs->buf_alloc);
 	spin_unlock_bh(&vvs->tx_lock);
@@ -261,6 +262,7 @@ virtio_transport_stream_do_dequeue(struct vsock_sock *vsk,
 	struct virtio_vsock_sock *vvs = vsk->trans;
 	struct virtio_vsock_pkt *pkt;
 	size_t bytes, total = 0;
+	u32 free_space;
 	int err = -EFAULT;
 
 	spin_lock_bh(&vvs->rx_lock);
@@ -291,11 +293,19 @@ virtio_transport_stream_do_dequeue(struct vsock_sock *vsk,
 			virtio_transport_free_pkt(pkt);
 		}
 	}
+
+	free_space = vvs->buf_alloc - (vvs->fwd_cnt - vvs->last_fwd_cnt);
+
 	spin_unlock_bh(&vvs->rx_lock);
 
-	/* Send a credit pkt to peer */
-	virtio_transport_send_credit_update(vsk, VIRTIO_VSOCK_TYPE_STREAM,
-					    NULL);
+	/* We send a credit update only when the space available seen
+	 * by the transmitter is less than VIRTIO_VSOCK_MAX_PKT_BUF_SIZE
+	 */
+	if (free_space < VIRTIO_VSOCK_MAX_PKT_BUF_SIZE) {
+		virtio_transport_send_credit_update(vsk,
+						    VIRTIO_VSOCK_TYPE_STREAM,
+						    NULL);
+	}
 
 	return total;
 
-- 
2.20.1
Stefano Garzarella
2019-Jul-17  11:30 UTC
[PATCH v4 3/5] vsock/virtio: fix locking in virtio_transport_inc_tx_pkt()
fwd_cnt and last_fwd_cnt are protected by rx_lock, so we should use
the same spinlock also if we are in the TX path.
Move also buf_alloc under the same lock.
Signed-off-by: Stefano Garzarella <sgarzare at redhat.com>
---
 include/linux/virtio_vsock.h            | 2 +-
 net/vmw_vsock/virtio_transport_common.c | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
index 49fc9d20bc43..4c7781f4b29b 100644
--- a/include/linux/virtio_vsock.h
+++ b/include/linux/virtio_vsock.h
@@ -35,7 +35,6 @@ struct virtio_vsock_sock {
 
 	/* Protected by tx_lock */
 	u32 tx_cnt;
-	u32 buf_alloc;
 	u32 peer_fwd_cnt;
 	u32 peer_buf_alloc;
 
@@ -43,6 +42,7 @@ struct virtio_vsock_sock {
 	u32 fwd_cnt;
 	u32 last_fwd_cnt;
 	u32 rx_bytes;
+	u32 buf_alloc;
 	struct list_head rx_queue;
 };
 
diff --git a/net/vmw_vsock/virtio_transport_common.c
b/net/vmw_vsock/virtio_transport_common.c
index a85559d4d974..34a2b42313b7 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -210,11 +210,11 @@ static void virtio_transport_dec_rx_pkt(struct
virtio_vsock_sock *vvs,
 
 void virtio_transport_inc_tx_pkt(struct virtio_vsock_sock *vvs, struct
virtio_vsock_pkt *pkt)
 {
-	spin_lock_bh(&vvs->tx_lock);
+	spin_lock_bh(&vvs->rx_lock);
 	vvs->last_fwd_cnt = vvs->fwd_cnt;
 	pkt->hdr.fwd_cnt = cpu_to_le32(vvs->fwd_cnt);
 	pkt->hdr.buf_alloc = cpu_to_le32(vvs->buf_alloc);
-	spin_unlock_bh(&vvs->tx_lock);
+	spin_unlock_bh(&vvs->rx_lock);
 }
 EXPORT_SYMBOL_GPL(virtio_transport_inc_tx_pkt);
 
-- 
2.20.1
Stefano Garzarella
2019-Jul-17  11:30 UTC
[PATCH v4 4/5] vhost/vsock: split packets to send using multiple buffers
If the packets to sent to the guest are bigger than the buffer
available, we can split them, using multiple buffers and fixing
the length in the packet header.
This is safe since virtio-vsock supports only stream sockets.
Signed-off-by: Stefano Garzarella <sgarzare at redhat.com>
---
 drivers/vhost/vsock.c                   | 66 ++++++++++++++++++-------
 net/vmw_vsock/virtio_transport_common.c | 15 ++++--
 2 files changed, 60 insertions(+), 21 deletions(-)
diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
index 6c8390a2af52..9f57736fe15e 100644
--- a/drivers/vhost/vsock.c
+++ b/drivers/vhost/vsock.c
@@ -102,7 +102,7 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
 		struct iov_iter iov_iter;
 		unsigned out, in;
 		size_t nbytes;
-		size_t len;
+		size_t iov_len, payload_len;
 		int head;
 
 		spin_lock_bh(&vsock->send_pkt_list_lock);
@@ -147,8 +147,24 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
 			break;
 		}
 
-		len = iov_length(&vq->iov[out], in);
-		iov_iter_init(&iov_iter, READ, &vq->iov[out], in, len);
+		iov_len = iov_length(&vq->iov[out], in);
+		if (iov_len < sizeof(pkt->hdr)) {
+			virtio_transport_free_pkt(pkt);
+			vq_err(vq, "Buffer len [%zu] too small\n", iov_len);
+			break;
+		}
+
+		iov_iter_init(&iov_iter, READ, &vq->iov[out], in, iov_len);
+		payload_len = pkt->len - pkt->off;
+
+		/* If the packet is greater than the space available in the
+		 * buffer, we split it using multiple buffers.
+		 */
+		if (payload_len > iov_len - sizeof(pkt->hdr))
+			payload_len = iov_len - sizeof(pkt->hdr);
+
+		/* Set the correct length in the header */
+		pkt->hdr.len = cpu_to_le32(payload_len);
 
 		nbytes = copy_to_iter(&pkt->hdr, sizeof(pkt->hdr), &iov_iter);
 		if (nbytes != sizeof(pkt->hdr)) {
@@ -157,33 +173,47 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
 			break;
 		}
 
-		nbytes = copy_to_iter(pkt->buf, pkt->len, &iov_iter);
-		if (nbytes != pkt->len) {
+		nbytes = copy_to_iter(pkt->buf + pkt->off, payload_len,
+				      &iov_iter);
+		if (nbytes != payload_len) {
 			virtio_transport_free_pkt(pkt);
 			vq_err(vq, "Faulted on copying pkt buf\n");
 			break;
 		}
 
-		vhost_add_used(vq, head, sizeof(pkt->hdr) + pkt->len);
+		vhost_add_used(vq, head, sizeof(pkt->hdr) + payload_len);
 		added = true;
 
-		if (pkt->reply) {
-			int val;
-
-			val = atomic_dec_return(&vsock->queued_replies);
-
-			/* Do we have resources to resume tx processing? */
-			if (val + 1 == tx_vq->num)
-				restart_tx = true;
-		}
-
 		/* Deliver to monitoring devices all correctly transmitted
 		 * packets.
 		 */
 		virtio_transport_deliver_tap_pkt(pkt);
 
-		total_len += pkt->len;
-		virtio_transport_free_pkt(pkt);
+		pkt->off += payload_len;
+		total_len += payload_len;
+
+		/* If we didn't send all the payload we can requeue the packet
+		 * to send it with the next available buffer.
+		 */
+		if (pkt->off < pkt->len) {
+			spin_lock_bh(&vsock->send_pkt_list_lock);
+			list_add(&pkt->list, &vsock->send_pkt_list);
+			spin_unlock_bh(&vsock->send_pkt_list_lock);
+		} else {
+			if (pkt->reply) {
+				int val;
+
+				val = atomic_dec_return(&vsock->queued_replies);
+
+				/* Do we have resources to resume tx
+				 * processing?
+				 */
+				if (val + 1 == tx_vq->num)
+					restart_tx = true;
+			}
+
+			virtio_transport_free_pkt(pkt);
+		}
 	} while(likely(!vhost_exceeds_weight(vq, ++pkts, total_len)));
 	if (added)
 		vhost_signal(&vsock->dev, vq);
diff --git a/net/vmw_vsock/virtio_transport_common.c
b/net/vmw_vsock/virtio_transport_common.c
index 34a2b42313b7..56fab3f03d0e 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -97,8 +97,17 @@ static struct sk_buff *virtio_transport_build_skb(void
*opaque)
 	struct virtio_vsock_pkt *pkt = opaque;
 	struct af_vsockmon_hdr *hdr;
 	struct sk_buff *skb;
+	size_t payload_len;
+	void *payload_buf;
 
-	skb = alloc_skb(sizeof(*hdr) + sizeof(pkt->hdr) + pkt->len,
+	/* A packet could be split to fit the RX buffer, so we can retrieve
+	 * the payload length from the header and the buffer pointer taking
+	 * care of the offset in the original packet.
+	 */
+	payload_len = le32_to_cpu(pkt->hdr.len);
+	payload_buf = pkt->buf + pkt->off;
+
+	skb = alloc_skb(sizeof(*hdr) + sizeof(pkt->hdr) + payload_len,
 			GFP_ATOMIC);
 	if (!skb)
 		return NULL;
@@ -138,8 +147,8 @@ static struct sk_buff *virtio_transport_build_skb(void
*opaque)
 
 	skb_put_data(skb, &pkt->hdr, sizeof(pkt->hdr));
 
-	if (pkt->len) {
-		skb_put_data(skb, pkt->buf, pkt->len);
+	if (payload_len) {
+		skb_put_data(skb, payload_buf, payload_len);
 	}
 
 	return skb;
-- 
2.20.1
Stefano Garzarella
2019-Jul-17  11:30 UTC
[PATCH v4 5/5] vsock/virtio: change the maximum packet size allowed
Since now we are able to split packets, we can avoid limiting their sizes to VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE. Instead, we can use VIRTIO_VSOCK_MAX_PKT_BUF_SIZE as the max packet size. Signed-off-by: Stefano Garzarella <sgarzare at redhat.com> --- net/vmw_vsock/virtio_transport_common.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c index 56fab3f03d0e..94cc0fa3e848 100644 --- a/net/vmw_vsock/virtio_transport_common.c +++ b/net/vmw_vsock/virtio_transport_common.c @@ -181,8 +181,8 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk, vvs = vsk->trans; /* we can send less than pkt_len bytes */ - if (pkt_len > VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE) - pkt_len = VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE; + if (pkt_len > VIRTIO_VSOCK_MAX_PKT_BUF_SIZE) + pkt_len = VIRTIO_VSOCK_MAX_PKT_BUF_SIZE; /* virtio_transport_get_credit might return less than pkt_len credit */ pkt_len = virtio_transport_get_credit(vvs, pkt_len); -- 2.20.1
Michael S. Tsirkin
2019-Jul-17  14:51 UTC
[PATCH v4 3/5] vsock/virtio: fix locking in virtio_transport_inc_tx_pkt()
On Wed, Jul 17, 2019 at 01:30:28PM +0200, Stefano Garzarella wrote:> fwd_cnt and last_fwd_cnt are protected by rx_lock, so we should use > the same spinlock also if we are in the TX path. > > Move also buf_alloc under the same lock. > > Signed-off-by: Stefano Garzarella <sgarzare at redhat.com>Wait a second is this a bugfix? If it's used under the wrong lock won't values get corrupted? Won't traffic then stall or more data get to sent than credits?> --- > include/linux/virtio_vsock.h | 2 +- > net/vmw_vsock/virtio_transport_common.c | 4 ++-- > 2 files changed, 3 insertions(+), 3 deletions(-) > > diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h > index 49fc9d20bc43..4c7781f4b29b 100644 > --- a/include/linux/virtio_vsock.h > +++ b/include/linux/virtio_vsock.h > @@ -35,7 +35,6 @@ struct virtio_vsock_sock { > > /* Protected by tx_lock */ > u32 tx_cnt; > - u32 buf_alloc; > u32 peer_fwd_cnt; > u32 peer_buf_alloc; > > @@ -43,6 +42,7 @@ struct virtio_vsock_sock { > u32 fwd_cnt; > u32 last_fwd_cnt; > u32 rx_bytes; > + u32 buf_alloc; > struct list_head rx_queue; > }; > > diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c > index a85559d4d974..34a2b42313b7 100644 > --- a/net/vmw_vsock/virtio_transport_common.c > +++ b/net/vmw_vsock/virtio_transport_common.c > @@ -210,11 +210,11 @@ static void virtio_transport_dec_rx_pkt(struct virtio_vsock_sock *vvs, > > void virtio_transport_inc_tx_pkt(struct virtio_vsock_sock *vvs, struct virtio_vsock_pkt *pkt) > { > - spin_lock_bh(&vvs->tx_lock); > + spin_lock_bh(&vvs->rx_lock); > vvs->last_fwd_cnt = vvs->fwd_cnt; > pkt->hdr.fwd_cnt = cpu_to_le32(vvs->fwd_cnt); > pkt->hdr.buf_alloc = cpu_to_le32(vvs->buf_alloc); > - spin_unlock_bh(&vvs->tx_lock); > + spin_unlock_bh(&vvs->rx_lock); > } > EXPORT_SYMBOL_GPL(virtio_transport_inc_tx_pkt); > > -- > 2.20.1
Michael S. Tsirkin
2019-Jul-17  14:54 UTC
[PATCH v4 4/5] vhost/vsock: split packets to send using multiple buffers
On Wed, Jul 17, 2019 at 01:30:29PM +0200, Stefano Garzarella wrote:> If the packets to sent to the guest are bigger than the buffer > available, we can split them, using multiple buffers and fixing > the length in the packet header. > This is safe since virtio-vsock supports only stream sockets. > > Signed-off-by: Stefano Garzarella <sgarzare at redhat.com>So how does it work right now? If an app does sendmsg with a 64K buffer and the other side publishes 4K buffers - does it just stall?> --- > drivers/vhost/vsock.c | 66 ++++++++++++++++++------- > net/vmw_vsock/virtio_transport_common.c | 15 ++++-- > 2 files changed, 60 insertions(+), 21 deletions(-) > > diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c > index 6c8390a2af52..9f57736fe15e 100644 > --- a/drivers/vhost/vsock.c > +++ b/drivers/vhost/vsock.c > @@ -102,7 +102,7 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock, > struct iov_iter iov_iter; > unsigned out, in; > size_t nbytes; > - size_t len; > + size_t iov_len, payload_len; > int head; > > spin_lock_bh(&vsock->send_pkt_list_lock); > @@ -147,8 +147,24 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock, > break; > } > > - len = iov_length(&vq->iov[out], in); > - iov_iter_init(&iov_iter, READ, &vq->iov[out], in, len); > + iov_len = iov_length(&vq->iov[out], in); > + if (iov_len < sizeof(pkt->hdr)) { > + virtio_transport_free_pkt(pkt); > + vq_err(vq, "Buffer len [%zu] too small\n", iov_len); > + break; > + } > + > + iov_iter_init(&iov_iter, READ, &vq->iov[out], in, iov_len); > + payload_len = pkt->len - pkt->off; > + > + /* If the packet is greater than the space available in the > + * buffer, we split it using multiple buffers. > + */ > + if (payload_len > iov_len - sizeof(pkt->hdr)) > + payload_len = iov_len - sizeof(pkt->hdr); > + > + /* Set the correct length in the header */ > + pkt->hdr.len = cpu_to_le32(payload_len); > > nbytes = copy_to_iter(&pkt->hdr, sizeof(pkt->hdr), &iov_iter); > if (nbytes != sizeof(pkt->hdr)) { > @@ -157,33 +173,47 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock, > break; > } > > - nbytes = copy_to_iter(pkt->buf, pkt->len, &iov_iter); > - if (nbytes != pkt->len) { > + nbytes = copy_to_iter(pkt->buf + pkt->off, payload_len, > + &iov_iter); > + if (nbytes != payload_len) { > virtio_transport_free_pkt(pkt); > vq_err(vq, "Faulted on copying pkt buf\n"); > break; > } > > - vhost_add_used(vq, head, sizeof(pkt->hdr) + pkt->len); > + vhost_add_used(vq, head, sizeof(pkt->hdr) + payload_len); > added = true; > > - if (pkt->reply) { > - int val; > - > - val = atomic_dec_return(&vsock->queued_replies); > - > - /* Do we have resources to resume tx processing? */ > - if (val + 1 == tx_vq->num) > - restart_tx = true; > - } > - > /* Deliver to monitoring devices all correctly transmitted > * packets. > */ > virtio_transport_deliver_tap_pkt(pkt); > > - total_len += pkt->len; > - virtio_transport_free_pkt(pkt); > + pkt->off += payload_len; > + total_len += payload_len; > + > + /* If we didn't send all the payload we can requeue the packet > + * to send it with the next available buffer. > + */ > + if (pkt->off < pkt->len) { > + spin_lock_bh(&vsock->send_pkt_list_lock); > + list_add(&pkt->list, &vsock->send_pkt_list); > + spin_unlock_bh(&vsock->send_pkt_list_lock); > + } else { > + if (pkt->reply) { > + int val; > + > + val = atomic_dec_return(&vsock->queued_replies); > + > + /* Do we have resources to resume tx > + * processing? > + */ > + if (val + 1 == tx_vq->num) > + restart_tx = true; > + } > + > + virtio_transport_free_pkt(pkt); > + } > } while(likely(!vhost_exceeds_weight(vq, ++pkts, total_len))); > if (added) > vhost_signal(&vsock->dev, vq); > diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c > index 34a2b42313b7..56fab3f03d0e 100644 > --- a/net/vmw_vsock/virtio_transport_common.c > +++ b/net/vmw_vsock/virtio_transport_common.c > @@ -97,8 +97,17 @@ static struct sk_buff *virtio_transport_build_skb(void *opaque) > struct virtio_vsock_pkt *pkt = opaque; > struct af_vsockmon_hdr *hdr; > struct sk_buff *skb; > + size_t payload_len; > + void *payload_buf; > > - skb = alloc_skb(sizeof(*hdr) + sizeof(pkt->hdr) + pkt->len, > + /* A packet could be split to fit the RX buffer, so we can retrieve > + * the payload length from the header and the buffer pointer taking > + * care of the offset in the original packet. > + */ > + payload_len = le32_to_cpu(pkt->hdr.len); > + payload_buf = pkt->buf + pkt->off; > + > + skb = alloc_skb(sizeof(*hdr) + sizeof(pkt->hdr) + payload_len, > GFP_ATOMIC); > if (!skb) > return NULL; > @@ -138,8 +147,8 @@ static struct sk_buff *virtio_transport_build_skb(void *opaque) > > skb_put_data(skb, &pkt->hdr, sizeof(pkt->hdr)); > > - if (pkt->len) { > - skb_put_data(skb, pkt->buf, pkt->len); > + if (payload_len) { > + skb_put_data(skb, payload_buf, payload_len); > } > > return skb; > -- > 2.20.1
Michael S. Tsirkin
2019-Jul-17  14:59 UTC
[PATCH v4 5/5] vsock/virtio: change the maximum packet size allowed
On Wed, Jul 17, 2019 at 01:30:30PM +0200, Stefano Garzarella wrote:> Since now we are able to split packets, we can avoid limiting > their sizes to VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE. > Instead, we can use VIRTIO_VSOCK_MAX_PKT_BUF_SIZE as the max > packet size. > > Signed-off-by: Stefano Garzarella <sgarzare at redhat.com>OK so this is kind of like GSO where we are passing 64K packets to the vsock and then split at the low level.> --- > net/vmw_vsock/virtio_transport_common.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c > index 56fab3f03d0e..94cc0fa3e848 100644 > --- a/net/vmw_vsock/virtio_transport_common.c > +++ b/net/vmw_vsock/virtio_transport_common.c > @@ -181,8 +181,8 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk, > vvs = vsk->trans; > > /* we can send less than pkt_len bytes */ > - if (pkt_len > VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE) > - pkt_len = VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE; > + if (pkt_len > VIRTIO_VSOCK_MAX_PKT_BUF_SIZE) > + pkt_len = VIRTIO_VSOCK_MAX_PKT_BUF_SIZE; > > /* virtio_transport_get_credit might return less than pkt_len credit */ > pkt_len = virtio_transport_get_credit(vvs, pkt_len); > -- > 2.20.1
Stefan Hajnoczi
2019-Jul-22  08:36 UTC
[PATCH v4 2/5] vsock/virtio: reduce credit update messages
On Wed, Jul 17, 2019 at 01:30:27PM +0200, Stefano Garzarella wrote:> In order to reduce the number of credit update messages, > we send them only when the space available seen by the > transmitter is less than VIRTIO_VSOCK_MAX_PKT_BUF_SIZE. > > Signed-off-by: Stefano Garzarella <sgarzare at redhat.com> > --- > include/linux/virtio_vsock.h | 1 + > net/vmw_vsock/virtio_transport_common.c | 16 +++++++++++++--- > 2 files changed, 14 insertions(+), 3 deletions(-)It's an arbitrary limit but the risk of regressions is low since the credit update traffic was so excessive: Reviewed-by: Stefan Hajnoczi <stefanha at redhat.com> -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 488 bytes Desc: not available URL: <http://lists.linuxfoundation.org/pipermail/virtualization/attachments/20190722/6ee4edd0/attachment.sig>
Stefan Hajnoczi
2019-Jul-22  08:53 UTC
[PATCH v4 3/5] vsock/virtio: fix locking in virtio_transport_inc_tx_pkt()
On Wed, Jul 17, 2019 at 01:30:28PM +0200, Stefano Garzarella wrote:> fwd_cnt and last_fwd_cnt are protected by rx_lock, so we should use > the same spinlock also if we are in the TX path. > > Move also buf_alloc under the same lock. > > Signed-off-by: Stefano Garzarella <sgarzare at redhat.com> > --- > include/linux/virtio_vsock.h | 2 +- > net/vmw_vsock/virtio_transport_common.c | 4 ++-- > 2 files changed, 3 insertions(+), 3 deletions(-)Reviewed-by: Stefan Hajnoczi <stefanha at redhat.com> -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 488 bytes Desc: not available URL: <http://lists.linuxfoundation.org/pipermail/virtualization/attachments/20190722/084558fe/attachment.sig>
Stefan Hajnoczi
2019-Jul-22  09:06 UTC
[PATCH v4 4/5] vhost/vsock: split packets to send using multiple buffers
On Wed, Jul 17, 2019 at 01:30:29PM +0200, Stefano Garzarella wrote:> If the packets to sent to the guest are bigger than the buffer > available, we can split them, using multiple buffers and fixing > the length in the packet header. > This is safe since virtio-vsock supports only stream sockets. > > Signed-off-by: Stefano Garzarella <sgarzare at redhat.com> > --- > drivers/vhost/vsock.c | 66 ++++++++++++++++++------- > net/vmw_vsock/virtio_transport_common.c | 15 ++++-- > 2 files changed, 60 insertions(+), 21 deletions(-)Reviewed-by: Stefan Hajnoczi <stefanha at redhat.com> -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 488 bytes Desc: not available URL: <http://lists.linuxfoundation.org/pipermail/virtualization/attachments/20190722/6d9bd8f4/attachment.sig>
Stefan Hajnoczi
2019-Jul-22  09:07 UTC
[PATCH v4 5/5] vsock/virtio: change the maximum packet size allowed
On Wed, Jul 17, 2019 at 01:30:30PM +0200, Stefano Garzarella wrote:> Since now we are able to split packets, we can avoid limiting > their sizes to VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE. > Instead, we can use VIRTIO_VSOCK_MAX_PKT_BUF_SIZE as the max > packet size. > > Signed-off-by: Stefano Garzarella <sgarzare at redhat.com> > --- > net/vmw_vsock/virtio_transport_common.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-)Reviewed-by: Stefan Hajnoczi <stefanha at redhat.com> -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 488 bytes Desc: not available URL: <http://lists.linuxfoundation.org/pipermail/virtualization/attachments/20190722/5d7b10df/attachment.sig>
Stefan Hajnoczi
2019-Jul-22  09:08 UTC
[PATCH v4 0/5] vsock/virtio: optimizations to increase the throughput
On Wed, Jul 17, 2019 at 01:30:25PM +0200, Stefano Garzarella wrote:> This series tries to increase the throughput of virtio-vsock with slight > changes. > While I was testing the v2 of this series I discovered an huge use of memory, > so I added patch 1 to mitigate this issue. I put it in this series in order > to better track the performance trends. > > v4: > - rebased all patches on current master (conflicts is Patch 4) > - Patch 1: added Stefan's R-b > - Patch 3: removed lock when buf_alloc is written [David]; > moved this patch after "vsock/virtio: reduce credit update messages" > to make it clearer > - Patch 4: vhost_exceeds_weight() is recently introduced, so I've solved some > conflictsStefano: Do you want to continue experimenting before we merge this patch series? The code looks functionally correct and the performance increases, so I'm happy for it to be merged. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 488 bytes Desc: not available URL: <http://lists.linuxfoundation.org/pipermail/virtualization/attachments/20190722/a2983f78/attachment.sig>
Stefano Garzarella
2019-Jul-22  09:14 UTC
[PATCH v4 0/5] vsock/virtio: optimizations to increase the throughput
On Mon, Jul 22, 2019 at 10:08:35AM +0100, Stefan Hajnoczi wrote:> On Wed, Jul 17, 2019 at 01:30:25PM +0200, Stefano Garzarella wrote: > > This series tries to increase the throughput of virtio-vsock with slight > > changes. > > While I was testing the v2 of this series I discovered an huge use of memory, > > so I added patch 1 to mitigate this issue. I put it in this series in order > > to better track the performance trends. > > > > v4: > > - rebased all patches on current master (conflicts is Patch 4) > > - Patch 1: added Stefan's R-b > > - Patch 3: removed lock when buf_alloc is written [David]; > > moved this patch after "vsock/virtio: reduce credit update messages" > > to make it clearer > > - Patch 4: vhost_exceeds_weight() is recently introduced, so I've solved some > > conflicts > > Stefano: Do you want to continue experimenting before we merge this > patch series? The code looks functionally correct and the performance > increases, so I'm happy for it to be merged.I think we can merge this series. I'll continue to do other experiments (e.g. removing TX workers, allocating pages, etc.) but I think these changes are prerequisites for the other patches, so we can merge them. Thank you very much for the reviews! Stefano
Michael S. Tsirkin
2019-Jul-29  13:59 UTC
[PATCH v4 0/5] vsock/virtio: optimizations to increase the throughput
On Wed, Jul 17, 2019 at 01:30:25PM +0200, Stefano Garzarella wrote:> This series tries to increase the throughput of virtio-vsock with slight > changes. > While I was testing the v2 of this series I discovered an huge use of memory, > so I added patch 1 to mitigate this issue. I put it in this series in order > to better track the performance trends.Series: Acked-by: Michael S. Tsirkin <mst at redhat.com> Can this go into net-next?> v4: > - rebased all patches on current master (conflicts is Patch 4) > - Patch 1: added Stefan's R-b > - Patch 3: removed lock when buf_alloc is written [David]; > moved this patch after "vsock/virtio: reduce credit update messages" > to make it clearer > - Patch 4: vhost_exceeds_weight() is recently introduced, so I've solved some > conflicts > > v3: https://patchwork.kernel.org/cover/10970145 > > v2: https://patchwork.kernel.org/cover/10938743 > > v1: https://patchwork.kernel.org/cover/10885431 > > Below are the benchmarks step by step. I used iperf3 [1] modified with VSOCK > support. As Micheal suggested in the v1, I booted host and guest with 'nosmap'. > > A brief description of patches: > - Patches 1: limit the memory usage with an extra copy for small packets > - Patches 2+3: reduce the number of credit update messages sent to the > transmitter > - Patches 4+5: allow the host to split packets on multiple buffers and use > VIRTIO_VSOCK_MAX_PKT_BUF_SIZE as the max packet size allowed > > host -> guest [Gbps] > pkt_size before opt p 1 p 2+3 p 4+5 > > 32 0.032 0.030 0.048 0.051 > 64 0.061 0.059 0.108 0.117 > 128 0.122 0.112 0.227 0.234 > 256 0.244 0.241 0.418 0.415 > 512 0.459 0.466 0.847 0.865 > 1K 0.927 0.919 1.657 1.641 > 2K 1.884 1.813 3.262 3.269 > 4K 3.378 3.326 6.044 6.195 > 8K 5.637 5.676 10.141 11.287 > 16K 8.250 8.402 15.976 16.736 > 32K 13.327 13.204 19.013 20.515 > 64K 21.241 21.341 20.973 21.879 > 128K 21.851 22.354 21.816 23.203 > 256K 21.408 21.693 21.846 24.088 > 512K 21.600 21.899 21.921 24.106 > > guest -> host [Gbps] > pkt_size before opt p 1 p 2+3 p 4+5 > > 32 0.045 0.046 0.057 0.057 > 64 0.089 0.091 0.103 0.104 > 128 0.170 0.179 0.192 0.200 > 256 0.364 0.351 0.361 0.379 > 512 0.709 0.699 0.731 0.790 > 1K 1.399 1.407 1.395 1.427 > 2K 2.670 2.684 2.745 2.835 > 4K 5.171 5.199 5.305 5.451 > 8K 8.442 8.500 10.083 9.941 > 16K 12.305 12.259 13.519 15.385 > 32K 11.418 11.150 11.988 24.680 > 64K 10.778 10.659 11.589 35.273 > 128K 10.421 10.339 10.939 40.338 > 256K 10.300 9.719 10.508 36.562 > 512K 9.833 9.808 10.612 35.979 > > As Stefan suggested in the v1, I measured also the efficiency in this way: > efficiency = Mbps / (%CPU_Host + %CPU_Guest) > > The '%CPU_Guest' is taken inside the VM. I know that it is not the best way, > but it's provided for free from iperf3 and could be an indication. > > host -> guest efficiency [Mbps / (%CPU_Host + %CPU_Guest)] > pkt_size before opt p 1 p 2+3 p 4+5 > > 32 0.35 0.45 0.79 1.02 > 64 0.56 0.80 1.41 1.54 > 128 1.11 1.52 3.03 3.12 > 256 2.20 2.16 5.44 5.58 > 512 4.17 4.18 10.96 11.46 > 1K 8.30 8.26 20.99 20.89 > 2K 16.82 16.31 39.76 39.73 > 4K 30.89 30.79 74.07 75.73 > 8K 53.74 54.49 124.24 148.91 > 16K 80.68 83.63 200.21 232.79 > 32K 132.27 132.52 260.81 357.07 > 64K 229.82 230.40 300.19 444.18 > 128K 332.60 329.78 331.51 492.28 > 256K 331.06 337.22 339.59 511.59 > 512K 335.58 328.50 331.56 504.56 > > guest -> host efficiency [Mbps / (%CPU_Host + %CPU_Guest)] > pkt_size before opt p 1 p 2+3 p 4+5 > > 32 0.43 0.43 0.53 0.56 > 64 0.85 0.86 1.04 1.10 > 128 1.63 1.71 2.07 2.13 > 256 3.48 3.35 4.02 4.22 > 512 6.80 6.67 7.97 8.63 > 1K 13.32 13.31 15.72 15.94 > 2K 25.79 25.92 30.84 30.98 > 4K 50.37 50.48 58.79 59.69 > 8K 95.90 96.15 107.04 110.33 > 16K 145.80 145.43 143.97 174.70 > 32K 147.06 144.74 146.02 282.48 > 64K 145.25 143.99 141.62 406.40 > 128K 149.34 146.96 147.49 489.34 > 256K 156.35 149.81 152.21 536.37 > 512K 151.65 150.74 151.52 519.93 > > [1] https://github.com/stefano-garzarella/iperf/ > > Stefano Garzarella (5): > vsock/virtio: limit the memory used per-socket > vsock/virtio: reduce credit update messages > vsock/virtio: fix locking in virtio_transport_inc_tx_pkt() > vhost/vsock: split packets to send using multiple buffers > vsock/virtio: change the maximum packet size allowed > > drivers/vhost/vsock.c | 68 ++++++++++++----- > include/linux/virtio_vsock.h | 4 +- > net/vmw_vsock/virtio_transport.c | 1 + > net/vmw_vsock/virtio_transport_common.c | 99 ++++++++++++++++++++----- > 4 files changed, 134 insertions(+), 38 deletions(-) > > -- > 2.20.1
Michael S. Tsirkin
2019-Jul-29  14:04 UTC
[PATCH v4 1/5] vsock/virtio: limit the memory used per-socket
On Wed, Jul 17, 2019 at 01:30:26PM +0200, Stefano Garzarella wrote:> Since virtio-vsock was introduced, the buffers filled by the host > and pushed to the guest using the vring, are directly queued in > a per-socket list. These buffers are preallocated by the guest > with a fixed size (4 KB). > > The maximum amount of memory used by each socket should be > controlled by the credit mechanism. > The default credit available per-socket is 256 KB, but if we use > only 1 byte per packet, the guest can queue up to 262144 of 4 KB > buffers, using up to 1 GB of memory per-socket. In addition, the > guest will continue to fill the vring with new 4 KB free buffers > to avoid starvation of other sockets. > > This patch mitigates this issue copying the payload of small > packets (< 128 bytes) into the buffer of last packet queued, in > order to avoid wasting memory. > > Reviewed-by: Stefan Hajnoczi <stefanha at redhat.com> > Signed-off-by: Stefano Garzarella <sgarzare at redhat.com>This is good enough for net-next, but for net I think we should figure out how to address the issue completely. Can we make the accounting precise? What happens to performance if we do?> --- > drivers/vhost/vsock.c | 2 + > include/linux/virtio_vsock.h | 1 + > net/vmw_vsock/virtio_transport.c | 1 + > net/vmw_vsock/virtio_transport_common.c | 60 +++++++++++++++++++++---- > 4 files changed, 55 insertions(+), 9 deletions(-) > > diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c > index 6a50e1d0529c..6c8390a2af52 100644 > --- a/drivers/vhost/vsock.c > +++ b/drivers/vhost/vsock.c > @@ -329,6 +329,8 @@ vhost_vsock_alloc_pkt(struct vhost_virtqueue *vq, > return NULL; > } > > + pkt->buf_len = pkt->len; > + > nbytes = copy_from_iter(pkt->buf, pkt->len, &iov_iter); > if (nbytes != pkt->len) { > vq_err(vq, "Expected %u byte payload, got %zu bytes\n", > diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h > index e223e2632edd..7d973903f52e 100644 > --- a/include/linux/virtio_vsock.h > +++ b/include/linux/virtio_vsock.h > @@ -52,6 +52,7 @@ struct virtio_vsock_pkt { > /* socket refcnt not held, only use for cancellation */ > struct vsock_sock *vsk; > void *buf; > + u32 buf_len; > u32 len; > u32 off; > bool reply; > diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c > index 0815d1357861..082a30936690 100644 > --- a/net/vmw_vsock/virtio_transport.c > +++ b/net/vmw_vsock/virtio_transport.c > @@ -307,6 +307,7 @@ static void virtio_vsock_rx_fill(struct virtio_vsock *vsock) > break; > } > > + pkt->buf_len = buf_len; > pkt->len = buf_len; > > sg_init_one(&hdr, &pkt->hdr, sizeof(pkt->hdr)); > diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c > index 6f1a8aff65c5..095221f94786 100644 > --- a/net/vmw_vsock/virtio_transport_common.c > +++ b/net/vmw_vsock/virtio_transport_common.c > @@ -26,6 +26,9 @@ > /* How long to wait for graceful shutdown of a connection */ > #define VSOCK_CLOSE_TIMEOUT (8 * HZ) > > +/* Threshold for detecting small packets to copy */ > +#define GOOD_COPY_LEN 128 > + > static const struct virtio_transport *virtio_transport_get_ops(void) > { > const struct vsock_transport *t = vsock_core_get_transport(); > @@ -64,6 +67,9 @@ virtio_transport_alloc_pkt(struct virtio_vsock_pkt_info *info, > pkt->buf = kmalloc(len, GFP_KERNEL); > if (!pkt->buf) > goto out_pkt; > + > + pkt->buf_len = len; > + > err = memcpy_from_msg(pkt->buf, info->msg, len); > if (err) > goto out; > @@ -841,24 +847,60 @@ virtio_transport_recv_connecting(struct sock *sk, > return err; > } > > +static void > +virtio_transport_recv_enqueue(struct vsock_sock *vsk, > + struct virtio_vsock_pkt *pkt) > +{ > + struct virtio_vsock_sock *vvs = vsk->trans; > + bool free_pkt = false; > + > + pkt->len = le32_to_cpu(pkt->hdr.len); > + pkt->off = 0; > + > + spin_lock_bh(&vvs->rx_lock); > + > + virtio_transport_inc_rx_pkt(vvs, pkt); > + > + /* Try to copy small packets into the buffer of last packet queued, > + * to avoid wasting memory queueing the entire buffer with a small > + * payload. > + */ > + if (pkt->len <= GOOD_COPY_LEN && !list_empty(&vvs->rx_queue)) { > + struct virtio_vsock_pkt *last_pkt; > + > + last_pkt = list_last_entry(&vvs->rx_queue, > + struct virtio_vsock_pkt, list); > + > + /* If there is space in the last packet queued, we copy the > + * new packet in its buffer. > + */ > + if (pkt->len <= last_pkt->buf_len - last_pkt->len) { > + memcpy(last_pkt->buf + last_pkt->len, pkt->buf, > + pkt->len); > + last_pkt->len += pkt->len; > + free_pkt = true; > + goto out; > + } > + } > + > + list_add_tail(&pkt->list, &vvs->rx_queue); > + > +out: > + spin_unlock_bh(&vvs->rx_lock); > + if (free_pkt) > + virtio_transport_free_pkt(pkt); > +} > + > static int > virtio_transport_recv_connected(struct sock *sk, > struct virtio_vsock_pkt *pkt) > { > struct vsock_sock *vsk = vsock_sk(sk); > - struct virtio_vsock_sock *vvs = vsk->trans; > int err = 0; > > switch (le16_to_cpu(pkt->hdr.op)) { > case VIRTIO_VSOCK_OP_RW: > - pkt->len = le32_to_cpu(pkt->hdr.len); > - pkt->off = 0; > - > - spin_lock_bh(&vvs->rx_lock); > - virtio_transport_inc_rx_pkt(vvs, pkt); > - list_add_tail(&pkt->list, &vvs->rx_queue); > - spin_unlock_bh(&vvs->rx_lock); > - > + virtio_transport_recv_enqueue(vsk, pkt); > sk->sk_data_ready(sk); > return err; > case VIRTIO_VSOCK_OP_CREDIT_UPDATE: > -- > 2.20.1
Stefano Garzarella
2019-Jul-30  09:40 UTC
[PATCH v4 0/5] vsock/virtio: optimizations to increase the throughput
On Mon, Jul 29, 2019 at 09:59:23AM -0400, Michael S. Tsirkin wrote:> On Wed, Jul 17, 2019 at 01:30:25PM +0200, Stefano Garzarella wrote: > > This series tries to increase the throughput of virtio-vsock with slight > > changes. > > While I was testing the v2 of this series I discovered an huge use of memory, > > so I added patch 1 to mitigate this issue. I put it in this series in order > > to better track the performance trends. > > Series: > > Acked-by: Michael S. Tsirkin <mst at redhat.com> > > Can this go into net-next? >I think so. Michael, Stefan thanks to ack the series! Should I resend it with net-next tag? Thanks, Stefano
Michael S. Tsirkin
2019-Sep-01  06:56 UTC
[PATCH v4 1/5] vsock/virtio: limit the memory used per-socket
On Fri, Aug 30, 2019 at 11:40:59AM +0200, Stefano Garzarella wrote:> On Mon, Jul 29, 2019 at 10:04:29AM -0400, Michael S. Tsirkin wrote: > > On Wed, Jul 17, 2019 at 01:30:26PM +0200, Stefano Garzarella wrote: > > > Since virtio-vsock was introduced, the buffers filled by the host > > > and pushed to the guest using the vring, are directly queued in > > > a per-socket list. These buffers are preallocated by the guest > > > with a fixed size (4 KB). > > > > > > The maximum amount of memory used by each socket should be > > > controlled by the credit mechanism. > > > The default credit available per-socket is 256 KB, but if we use > > > only 1 byte per packet, the guest can queue up to 262144 of 4 KB > > > buffers, using up to 1 GB of memory per-socket. In addition, the > > > guest will continue to fill the vring with new 4 KB free buffers > > > to avoid starvation of other sockets. > > > > > > This patch mitigates this issue copying the payload of small > > > packets (< 128 bytes) into the buffer of last packet queued, in > > > order to avoid wasting memory. > > > > > > Reviewed-by: Stefan Hajnoczi <stefanha at redhat.com> > > > Signed-off-by: Stefano Garzarella <sgarzare at redhat.com> > > > > This is good enough for net-next, but for net I think we > > should figure out how to address the issue completely. > > Can we make the accounting precise? What happens to > > performance if we do? > > > > Since I'm back from holidays, I'm restarting this thread to figure out > how to address the issue completely. > > I did a better analysis of the credit mechanism that we implemented in > virtio-vsock to get a clearer view and I'd share it with you: > > This issue affect only the "host->guest" path. In this case, when the > host wants to send a packet to the guest, it uses a "free" buffer > allocated by the guest (4KB). > The "free" buffers available for the host are shared between all > sockets, instead, the credit mechanism is per-socket, I think to > avoid the starvation of others sockets. > The guests re-fill the "free" queue when the available buffers are > less than half. > > Each peer have these variables in the per-socket state: > /* local vars */ > buf_alloc /* max bytes usable by this socket > [exposed to the other peer] */ > fwd_cnt /* increased when RX packet is consumed by the > user space [exposed to the other peer] */ > tx_cnt /* increased when TX packet is sent to the other peer */ > > /* remote vars */ > peer_buf_alloc /* peer's buf_alloc */ > peer_fwd_cnt /* peer's fwd_cnt */ > > When a peer sends a packet, it increases the 'tx_cnt'; when the > receiver consumes the packet (copy it to the user-space buffer), it > increases the 'fwd_cnt'. > Note: increments are made considering the payload length and not the > buffer length. > > The value of 'buf_alloc' and 'fwd_cnt' are sent to the other peer in > all packet headers or with an explicit CREDIT_UPDATE packet. > > The local 'buf_alloc' value can be modified by the user space using > setsockopt() with optname=SO_VM_SOCKETS_BUFFER_SIZE. > > Before to send a packet, the peer checks the space available: > credit_available = peer_buf_alloc - (tx_cnt - peer_fwd_cnt) > and it will send up to credit_available bytes to the other peer. > > Possible solutions considering Michael's advice: > 1. Use the buffer length instead of the payload length when we increment > the counters: > - This approach will account precisely the memory used per socket. > - This requires changes in both guest and host. > - It is not compatible with old drivers, so a feature should be negotiated. > 2. Decrease the advertised 'buf_alloc' taking count of bytes queued in > the socket queue but not used. (e.g. 256 byte used on 4K available in > the buffer) > - pkt->hdr.buf_alloc = buf_alloc - bytes_not_used. > - This should be compatible also with old drivers. > > Maybe the second is less invasive, but will it be too tricky? > Any other advice or suggestions? > > Thanks in advance, > StefanoOK let me try to clarify. The idea is this: Let's say we queue a buffer of 4K, and we copy if len < 128 bytes. This means that in the worst case (128 byte packets), each byte of credit in the socket uses up 4K/128 = 16 bytes of kernel memory. In fact we need to also account for the virtio_vsock_pkt since I think it's kept around until userspace consumes it. Thus given X buf alloc allowed in the socket, we should publish X/16 credits to the other side. This will ensure the other side does not send more than X/16 bytes for a given socket and thus we won't need to allocate more than X bytes to hold the data. We can play with the copy break value to tweak this.
Stefan Hajnoczi
2019-Sep-02  08:39 UTC
[PATCH v4 1/5] vsock/virtio: limit the memory used per-socket
On Sun, Sep 01, 2019 at 02:56:44AM -0400, Michael S. Tsirkin wrote:> On Fri, Aug 30, 2019 at 11:40:59AM +0200, Stefano Garzarella wrote: > > On Mon, Jul 29, 2019 at 10:04:29AM -0400, Michael S. Tsirkin wrote: > > > On Wed, Jul 17, 2019 at 01:30:26PM +0200, Stefano Garzarella wrote: > > > > Since virtio-vsock was introduced, the buffers filled by the host > > > > and pushed to the guest using the vring, are directly queued in > > > > a per-socket list. These buffers are preallocated by the guest > > > > with a fixed size (4 KB). > > > > > > > > The maximum amount of memory used by each socket should be > > > > controlled by the credit mechanism. > > > > The default credit available per-socket is 256 KB, but if we use > > > > only 1 byte per packet, the guest can queue up to 262144 of 4 KB > > > > buffers, using up to 1 GB of memory per-socket. In addition, the > > > > guest will continue to fill the vring with new 4 KB free buffers > > > > to avoid starvation of other sockets. > > > > > > > > This patch mitigates this issue copying the payload of small > > > > packets (< 128 bytes) into the buffer of last packet queued, in > > > > order to avoid wasting memory. > > > > > > > > Reviewed-by: Stefan Hajnoczi <stefanha at redhat.com> > > > > Signed-off-by: Stefano Garzarella <sgarzare at redhat.com> > > > > > > This is good enough for net-next, but for net I think we > > > should figure out how to address the issue completely. > > > Can we make the accounting precise? What happens to > > > performance if we do? > > > > > > > Since I'm back from holidays, I'm restarting this thread to figure out > > how to address the issue completely. > > > > I did a better analysis of the credit mechanism that we implemented in > > virtio-vsock to get a clearer view and I'd share it with you: > > > > This issue affect only the "host->guest" path. In this case, when the > > host wants to send a packet to the guest, it uses a "free" buffer > > allocated by the guest (4KB). > > The "free" buffers available for the host are shared between all > > sockets, instead, the credit mechanism is per-socket, I think to > > avoid the starvation of others sockets. > > The guests re-fill the "free" queue when the available buffers are > > less than half. > > > > Each peer have these variables in the per-socket state: > > /* local vars */ > > buf_alloc /* max bytes usable by this socket > > [exposed to the other peer] */ > > fwd_cnt /* increased when RX packet is consumed by the > > user space [exposed to the other peer] */ > > tx_cnt /* increased when TX packet is sent to the other peer */ > > > > /* remote vars */ > > peer_buf_alloc /* peer's buf_alloc */ > > peer_fwd_cnt /* peer's fwd_cnt */ > > > > When a peer sends a packet, it increases the 'tx_cnt'; when the > > receiver consumes the packet (copy it to the user-space buffer), it > > increases the 'fwd_cnt'. > > Note: increments are made considering the payload length and not the > > buffer length. > > > > The value of 'buf_alloc' and 'fwd_cnt' are sent to the other peer in > > all packet headers or with an explicit CREDIT_UPDATE packet. > > > > The local 'buf_alloc' value can be modified by the user space using > > setsockopt() with optname=SO_VM_SOCKETS_BUFFER_SIZE. > > > > Before to send a packet, the peer checks the space available: > > credit_available = peer_buf_alloc - (tx_cnt - peer_fwd_cnt) > > and it will send up to credit_available bytes to the other peer. > > > > Possible solutions considering Michael's advice: > > 1. Use the buffer length instead of the payload length when we increment > > the counters: > > - This approach will account precisely the memory used per socket. > > - This requires changes in both guest and host. > > - It is not compatible with old drivers, so a feature should be negotiated. > > 2. Decrease the advertised 'buf_alloc' taking count of bytes queued in > > the socket queue but not used. (e.g. 256 byte used on 4K available in > > the buffer) > > - pkt->hdr.buf_alloc = buf_alloc - bytes_not_used. > > - This should be compatible also with old drivers. > > > > Maybe the second is less invasive, but will it be too tricky? > > Any other advice or suggestions? > > > > Thanks in advance, > > Stefano > > OK let me try to clarify. The idea is this: > > Let's say we queue a buffer of 4K, and we copy if len < 128 bytes. This > means that in the worst case (128 byte packets), each byte of credit in > the socket uses up 4K/128 = 16 bytes of kernel memory. In fact we need > to also account for the virtio_vsock_pkt since I think it's kept around > until userspace consumes it. > > Thus given X buf alloc allowed in the socket, we should publish X/16 > credits to the other side. This will ensure the other side does not send > more than X/16 bytes for a given socket and thus we won't need to > allocate more than X bytes to hold the data. > > We can play with the copy break value to tweak this.This seems like a reasonable solution. Hopefully the benchmark results will come out okay too. Stefan -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 488 bytes Desc: not available URL: <http://lists.linuxfoundation.org/pipermail/virtualization/attachments/20190902/ebbe4033/attachment.sig>
Michael S. Tsirkin
2019-Sep-03  04:38 UTC
[PATCH v4 2/5] vsock/virtio: reduce credit update messages
On Wed, Jul 17, 2019 at 01:30:27PM +0200, Stefano Garzarella wrote:> In order to reduce the number of credit update messages, > we send them only when the space available seen by the > transmitter is less than VIRTIO_VSOCK_MAX_PKT_BUF_SIZE. > > Signed-off-by: Stefano Garzarella <sgarzare at redhat.com> > --- > include/linux/virtio_vsock.h | 1 + > net/vmw_vsock/virtio_transport_common.c | 16 +++++++++++++--- > 2 files changed, 14 insertions(+), 3 deletions(-) > > diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h > index 7d973903f52e..49fc9d20bc43 100644 > --- a/include/linux/virtio_vsock.h > +++ b/include/linux/virtio_vsock.h > @@ -41,6 +41,7 @@ struct virtio_vsock_sock { > > /* Protected by rx_lock */ > u32 fwd_cnt; > + u32 last_fwd_cnt; > u32 rx_bytes; > struct list_head rx_queue; > }; > diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c > index 095221f94786..a85559d4d974 100644 > --- a/net/vmw_vsock/virtio_transport_common.c > +++ b/net/vmw_vsock/virtio_transport_common.c > @@ -211,6 +211,7 @@ static void virtio_transport_dec_rx_pkt(struct virtio_vsock_sock *vvs, > void virtio_transport_inc_tx_pkt(struct virtio_vsock_sock *vvs, struct virtio_vsock_pkt *pkt) > { > spin_lock_bh(&vvs->tx_lock); > + vvs->last_fwd_cnt = vvs->fwd_cnt; > pkt->hdr.fwd_cnt = cpu_to_le32(vvs->fwd_cnt); > pkt->hdr.buf_alloc = cpu_to_le32(vvs->buf_alloc); > spin_unlock_bh(&vvs->tx_lock); > @@ -261,6 +262,7 @@ virtio_transport_stream_do_dequeue(struct vsock_sock *vsk, > struct virtio_vsock_sock *vvs = vsk->trans; > struct virtio_vsock_pkt *pkt; > size_t bytes, total = 0; > + u32 free_space; > int err = -EFAULT; > > spin_lock_bh(&vvs->rx_lock); > @@ -291,11 +293,19 @@ virtio_transport_stream_do_dequeue(struct vsock_sock *vsk, > virtio_transport_free_pkt(pkt); > } > } > + > + free_space = vvs->buf_alloc - (vvs->fwd_cnt - vvs->last_fwd_cnt); > + > spin_unlock_bh(&vvs->rx_lock); > > - /* Send a credit pkt to peer */ > - virtio_transport_send_credit_update(vsk, VIRTIO_VSOCK_TYPE_STREAM, > - NULL); > + /* We send a credit update only when the space available seen > + * by the transmitter is less than VIRTIO_VSOCK_MAX_PKT_BUF_SIZEThis is just repeating what code does though. Please include the *reason* for the condition. E.g. here's a better comment: /* To reduce number of credit update messages, * don't update credits as long as lots of space is available. * Note: the limit chosen here is arbitrary. Setting the limit * too high causes extra messages. Too low causes transmitter * stalls. As stalls are in theory more expensive than extra * messages, we set the limit to a high value. TODO: experiment * with different values. */> + */ > + if (free_space < VIRTIO_VSOCK_MAX_PKT_BUF_SIZE) { > + virtio_transport_send_credit_update(vsk, > + VIRTIO_VSOCK_TYPE_STREAM, > + NULL); > + } > > return total; > > -- > 2.20.1
Michael S. Tsirkin
2019-Sep-03  08:02 UTC
request for stable (was Re: [PATCH v4 0/5] vsock/virtio: optimizations to increase the throughput)
Patches 1,3 and 4 are needed for stable. Dave, could you queue them there please? On Wed, Jul 17, 2019 at 01:30:25PM +0200, Stefano Garzarella wrote:> This series tries to increase the throughput of virtio-vsock with slight > changes. > While I was testing the v2 of this series I discovered an huge use of memory, > so I added patch 1 to mitigate this issue. I put it in this series in order > to better track the performance trends. > > v4: > - rebased all patches on current master (conflicts is Patch 4) > - Patch 1: added Stefan's R-b > - Patch 3: removed lock when buf_alloc is written [David]; > moved this patch after "vsock/virtio: reduce credit update messages" > to make it clearer > - Patch 4: vhost_exceeds_weight() is recently introduced, so I've solved some > conflicts > > v3: https://patchwork.kernel.org/cover/10970145 > > v2: https://patchwork.kernel.org/cover/10938743 > > v1: https://patchwork.kernel.org/cover/10885431 > > Below are the benchmarks step by step. I used iperf3 [1] modified with VSOCK > support. As Micheal suggested in the v1, I booted host and guest with 'nosmap'. > > A brief description of patches: > - Patches 1: limit the memory usage with an extra copy for small packets > - Patches 2+3: reduce the number of credit update messages sent to the > transmitter > - Patches 4+5: allow the host to split packets on multiple buffers and use > VIRTIO_VSOCK_MAX_PKT_BUF_SIZE as the max packet size allowed > > host -> guest [Gbps] > pkt_size before opt p 1 p 2+3 p 4+5 > > 32 0.032 0.030 0.048 0.051 > 64 0.061 0.059 0.108 0.117 > 128 0.122 0.112 0.227 0.234 > 256 0.244 0.241 0.418 0.415 > 512 0.459 0.466 0.847 0.865 > 1K 0.927 0.919 1.657 1.641 > 2K 1.884 1.813 3.262 3.269 > 4K 3.378 3.326 6.044 6.195 > 8K 5.637 5.676 10.141 11.287 > 16K 8.250 8.402 15.976 16.736 > 32K 13.327 13.204 19.013 20.515 > 64K 21.241 21.341 20.973 21.879 > 128K 21.851 22.354 21.816 23.203 > 256K 21.408 21.693 21.846 24.088 > 512K 21.600 21.899 21.921 24.106 > > guest -> host [Gbps] > pkt_size before opt p 1 p 2+3 p 4+5 > > 32 0.045 0.046 0.057 0.057 > 64 0.089 0.091 0.103 0.104 > 128 0.170 0.179 0.192 0.200 > 256 0.364 0.351 0.361 0.379 > 512 0.709 0.699 0.731 0.790 > 1K 1.399 1.407 1.395 1.427 > 2K 2.670 2.684 2.745 2.835 > 4K 5.171 5.199 5.305 5.451 > 8K 8.442 8.500 10.083 9.941 > 16K 12.305 12.259 13.519 15.385 > 32K 11.418 11.150 11.988 24.680 > 64K 10.778 10.659 11.589 35.273 > 128K 10.421 10.339 10.939 40.338 > 256K 10.300 9.719 10.508 36.562 > 512K 9.833 9.808 10.612 35.979 > > As Stefan suggested in the v1, I measured also the efficiency in this way: > efficiency = Mbps / (%CPU_Host + %CPU_Guest) > > The '%CPU_Guest' is taken inside the VM. I know that it is not the best way, > but it's provided for free from iperf3 and could be an indication. > > host -> guest efficiency [Mbps / (%CPU_Host + %CPU_Guest)] > pkt_size before opt p 1 p 2+3 p 4+5 > > 32 0.35 0.45 0.79 1.02 > 64 0.56 0.80 1.41 1.54 > 128 1.11 1.52 3.03 3.12 > 256 2.20 2.16 5.44 5.58 > 512 4.17 4.18 10.96 11.46 > 1K 8.30 8.26 20.99 20.89 > 2K 16.82 16.31 39.76 39.73 > 4K 30.89 30.79 74.07 75.73 > 8K 53.74 54.49 124.24 148.91 > 16K 80.68 83.63 200.21 232.79 > 32K 132.27 132.52 260.81 357.07 > 64K 229.82 230.40 300.19 444.18 > 128K 332.60 329.78 331.51 492.28 > 256K 331.06 337.22 339.59 511.59 > 512K 335.58 328.50 331.56 504.56 > > guest -> host efficiency [Mbps / (%CPU_Host + %CPU_Guest)] > pkt_size before opt p 1 p 2+3 p 4+5 > > 32 0.43 0.43 0.53 0.56 > 64 0.85 0.86 1.04 1.10 > 128 1.63 1.71 2.07 2.13 > 256 3.48 3.35 4.02 4.22 > 512 6.80 6.67 7.97 8.63 > 1K 13.32 13.31 15.72 15.94 > 2K 25.79 25.92 30.84 30.98 > 4K 50.37 50.48 58.79 59.69 > 8K 95.90 96.15 107.04 110.33 > 16K 145.80 145.43 143.97 174.70 > 32K 147.06 144.74 146.02 282.48 > 64K 145.25 143.99 141.62 406.40 > 128K 149.34 146.96 147.49 489.34 > 256K 156.35 149.81 152.21 536.37 > 512K 151.65 150.74 151.52 519.93 > > [1] https://github.com/stefano-garzarella/iperf/ > > Stefano Garzarella (5): > vsock/virtio: limit the memory used per-socket > vsock/virtio: reduce credit update messages > vsock/virtio: fix locking in virtio_transport_inc_tx_pkt() > vhost/vsock: split packets to send using multiple buffers > vsock/virtio: change the maximum packet size allowed > > drivers/vhost/vsock.c | 68 ++++++++++++----- > include/linux/virtio_vsock.h | 4 +- > net/vmw_vsock/virtio_transport.c | 1 + > net/vmw_vsock/virtio_transport_common.c | 99 ++++++++++++++++++++----- > 4 files changed, 134 insertions(+), 38 deletions(-) > > -- > 2.20.1
Stefano Garzarella
2019-Oct-11  13:40 UTC
[PATCH v4 1/5] vsock/virtio: limit the memory used per-socket
On Sun, Sep 1, 2019 at 8:56 AM Michael S. Tsirkin <mst at redhat.com> wrote:> On Fri, Aug 30, 2019 at 11:40:59AM +0200, Stefano Garzarella wrote: > > On Mon, Jul 29, 2019 at 10:04:29AM -0400, Michael S. Tsirkin wrote: > > > On Wed, Jul 17, 2019 at 01:30:26PM +0200, Stefano Garzarella wrote: > > > > Since virtio-vsock was introduced, the buffers filled by the host > > > > and pushed to the guest using the vring, are directly queued in > > > > a per-socket list. These buffers are preallocated by the guest > > > > with a fixed size (4 KB). > > > > > > > > The maximum amount of memory used by each socket should be > > > > controlled by the credit mechanism. > > > > The default credit available per-socket is 256 KB, but if we use > > > > only 1 byte per packet, the guest can queue up to 262144 of 4 KB > > > > buffers, using up to 1 GB of memory per-socket. In addition, the > > > > guest will continue to fill the vring with new 4 KB free buffers > > > > to avoid starvation of other sockets. > > > > > > > > This patch mitigates this issue copying the payload of small > > > > packets (< 128 bytes) into the buffer of last packet queued, in > > > > order to avoid wasting memory. > > > > > > > > Reviewed-by: Stefan Hajnoczi <stefanha at redhat.com> > > > > Signed-off-by: Stefano Garzarella <sgarzare at redhat.com> > > > > > > This is good enough for net-next, but for net I think we > > > should figure out how to address the issue completely. > > > Can we make the accounting precise? What happens to > > > performance if we do? > > > > > > > Since I'm back from holidays, I'm restarting this thread to figure out > > how to address the issue completely. > > > > I did a better analysis of the credit mechanism that we implemented in > > virtio-vsock to get a clearer view and I'd share it with you: > > > > This issue affect only the "host->guest" path. In this case, when the > > host wants to send a packet to the guest, it uses a "free" buffer > > allocated by the guest (4KB). > > The "free" buffers available for the host are shared between all > > sockets, instead, the credit mechanism is per-socket, I think to > > avoid the starvation of others sockets. > > The guests re-fill the "free" queue when the available buffers are > > less than half. > > > > Each peer have these variables in the per-socket state: > > /* local vars */ > > buf_alloc /* max bytes usable by this socket > > [exposed to the other peer] */ > > fwd_cnt /* increased when RX packet is consumed by the > > user space [exposed to the other peer] */ > > tx_cnt /* increased when TX packet is sent to the other peer */ > > > > /* remote vars */ > > peer_buf_alloc /* peer's buf_alloc */ > > peer_fwd_cnt /* peer's fwd_cnt */ > > > > When a peer sends a packet, it increases the 'tx_cnt'; when the > > receiver consumes the packet (copy it to the user-space buffer), it > > increases the 'fwd_cnt'. > > Note: increments are made considering the payload length and not the > > buffer length. > > > > The value of 'buf_alloc' and 'fwd_cnt' are sent to the other peer in > > all packet headers or with an explicit CREDIT_UPDATE packet. > > > > The local 'buf_alloc' value can be modified by the user space using > > setsockopt() with optname=SO_VM_SOCKETS_BUFFER_SIZE. > > > > Before to send a packet, the peer checks the space available: > > credit_available = peer_buf_alloc - (tx_cnt - peer_fwd_cnt) > > and it will send up to credit_available bytes to the other peer. > > > > Possible solutions considering Michael's advice: > > 1. Use the buffer length instead of the payload length when we increment > > the counters: > > - This approach will account precisely the memory used per socket. > > - This requires changes in both guest and host. > > - It is not compatible with old drivers, so a feature should be negotiated. > > 2. Decrease the advertised 'buf_alloc' taking count of bytes queued in > > the socket queue but not used. (e.g. 256 byte used on 4K available in > > the buffer) > > - pkt->hdr.buf_alloc = buf_alloc - bytes_not_used. > > - This should be compatible also with old drivers. > > > > Maybe the second is less invasive, but will it be too tricky? > > Any other advice or suggestions? > > > > Thanks in advance, > > Stefano > > OK let me try to clarify. The idea is this: > > Let's say we queue a buffer of 4K, and we copy if len < 128 bytes. This > means that in the worst case (128 byte packets), each byte of credit in > the socket uses up 4K/128 = 16 bytes of kernel memory. In fact we need > to also account for the virtio_vsock_pkt since I think it's kept around > until userspace consumes it. > > Thus given X buf alloc allowed in the socket, we should publish X/16 > credits to the other side. This will ensure the other side does not send > more than X/16 bytes for a given socket and thus we won't need to > allocate more than X bytes to hold the data. > > We can play with the copy break value to tweak this. >Hi Michael, sorry for the long silence, but I focused on multi-transport. Before to implement your idea, I tried to do some calculations and looking better to our credit mechanism: buf_alloc = 256 KB (default, tunable through setsockopt) sizeof(struct virtio_vsock_pkt) = 128 - guest (we use preallocated 4 KB buffers to receive packets, copying small packet - < 128 -) worst_case = 129 buf_size = 4 KB credit2mem = (buf_size + sizeof(struct virtio_vsock_pkt)) / worst_case = 32 credit_published = buf_alloc / credit2mem = ~8 KB Space for just 2 full packet (4 KB) - host (we copy packets from the vring, allocating the space for the payload) worst_case = 1 buf_size = 1 credit2mem = (buf_size + sizeof(struct virtio_vsock_pkt)) / worst_case = 129 credit_published = buf_alloc / credit2mem = ~2 KB Less than a full packet (guest now can send up to 64 KB with a single packet, so it will be limited to 2 KB) Current memory consumption in the worst case if the RX queue is full: - guest mem = (buf_alloc / worst_case) * (buf_size + sizeof(struct virtio_vsock_pkt) = ~8MB - host mem = (buf_alloc / worst_case) * (buf_size + sizeof(struct virtio_vsock_pkt) = ~32MB I think that the performance with big packets will be affected, but I still have to try. Another approach that I want to explore is to play with buf_alloc published to the peer. One thing that's not clear to me yet is the meaning of SO_VM_SOCKETS_BUFFER_SIZE: - max amount of memory used in the RX queue - max amount of payload bytes in the RX queue (without overhead of struct virtio_vsock_pkt + preallocated buffer)>From the 'include/uapi/linux/vm_sockets.h':/* Option name for STREAM socket buffer size. Use as the option name in * setsockopt(3) or getsockopt(3) to set or get an unsigned long long that * specifies the size of the buffer underlying a vSockets STREAM socket. * Value is clamped to the MIN and MAX. */ #define SO_VM_SOCKETS_BUFFER_SIZE 0 Regardless, I think we need to limit memory consumption in some way. I'll check the implementation of other transports, to understand better. I'll keep you updated! Thanks, Stefano
Reasonably Related Threads
- [PATCH v4 4/5] vhost/vsock: split packets to send using multiple buffers
- [PATCH v4 4/5] vhost/vsock: split packets to send using multiple buffers
- [RFC PATCH v1 06/12] vsock/virtio: non-linear skb handling for TAP dev
- [PATCH RFC 2/4] vhost/vsock: split packets to send using multiple buffers
- [PATCH v2 3/5] VSOCK: support receive mergeable rx buffer in guest