divide the throughput by the host CPU utilization
(measured by something like mpstat).
Sometimes throughput doesn't increase (e.g. guest-host)
by CPU utilization does decrease. So it's interesting.
Another issue is that we are trying to improve the latency
of a busy queue here. However STREAM/MAERTS tests ignore the latency
(more or less) while TCP_RR by default runs a single packet per queue.
Without arguing about whether these are practically interesting
workloads, these results are thus unlikely to be significantly affected
by the optimization in question.
What we are interested in, thus, is either TCP_RR with a -b flag
(configure with --enable-burst) or multiple concurrent
TCP_RRs.
> *** Local Guest-to-Guest ***
>
> Here's the local guest-to-guest summary for 1 VM pair doing TCP_RR with
> 256/256 request/response message size in transactions per second:
>
> Instances Base V0 V1 V2
> 1 8,151.56 8,460.72 8,439.16 9,990.37
> 25 48,761.74 51,032.62 51,103.25 49,533.52
> 50 55,687.38 55,974.18 56,854.10 54,888.65
> 100 58,255.06 58,255.86 60,380.90 59,308.36
>
> Here's the local guest-to-guest summary for 2 VM pairs doing TCP_RR
with
> 256/256 request/response message size in transactions per second:
>
> Instances Base V0 V1 V2
> 1 18,758.48 19,112.50 18,597.07 19,252.04
> 25 80,500.50 78,801.78 80,590.68 78,782.07
> 50 80,594.20 77,985.44 80,431.72 77,246.90
> 100 82,023.23 81,325.96 81,303.32 81,727.54
>
> Here's the local guest-to-guest summary for 1 VM pair doing TCP_STREAM
with
> 256, 1K, 4K and 16K message size in Mbps:
>
> 256:
> Instances Base V0 V1 V2
> 1 961.78 1,115.92 794.02 740.37
> 4 2,498.33 2,541.82 2,441.60 2,308.26
>
> 1K:
> 1 3,476.61 3,522.02 2,170.86 1,395.57
> 4 6,344.30 7,056.57 7,275.16 7,174.09
>
> 4K:
> 1 9,213.57 10,647.44 9,883.42 9,007.29
> 4 11,070.66 11,300.37 11,001.02 12,103.72
>
> 16K:
> 1 12,065.94 9,437.78 11,710.60 6,989.93
> 4 12,755.28 13,050.78 12,518.06 13,227.33
>
> Here's the local guest-to-guest summary for 2 VM pairs doing TCP_STREAM
with
> 256, 1K, 4K and 16K message size in Mbps:
>
> 256:
> Instances Base V0 V1 V2
> 1 2,434.98 2,403.23 2,308.69 2,261.35
> 4 5,973.82 5,729.48 5,956.76 5,831.86
>
> 1K:
> 1 5,305.99 5,148.72 4,960.67 5,067.76
> 4 10,628.38 10,649.49 10,098.90 10,380.09
>
> 4K:
> 1 11,577.03 10,710.33 11,700.53 10,304.09
> 4 14,580.66 14,881.38 14,551.17 15,053.02
>
> 16K:
> 1 16,801.46 16,072.50 15,773.78 15,835.66
> 4 17,194.00 17,294.02 17,319.78 17,121.09
>
>
> *** Remote Host-to-Guest ***
>
> Here's the remote host-to-guest summary for 1 VM doing TCP_RR with
> 256/256 request/response message size in transactions per second:
>
> Instances Base V0 V1 V2
> 1 9,732.99 10,307.98 10,529.82 8,889.28
> 25 43,976.18 49,480.50 46,536.66 45,682.38
> 50 63,031.33 67,127.15 60,073.34 65,748.62
> 100 64,778.43 65,338.07 66,774.12 69,391.22
>
> Here's the remote host-to-guest summary for 4 VMs doing TCP_RR with
> 256/256 request/response message size in transactions per second:
>
> Instances Base V0 V1 V2
> 1 39,270.42 38,253.60 39,353.10 39,566.33
> 25 207,120.91 207,964.50 211,539.70 213,882.21
> 50 218,801.54 221,490.56 220,529.48 223,594.25
> 100 218,432.62 215,061.44 222,011.61 223,480.47
>
> Here's the remote host-to-guest summary for 1 VM doing TCP_STREAM with
> 256, 1K, 4K and 16K message size in Mbps:
>
> 256:
> Instances Base V0 V1 V2
> 1 2,274.74 2,220.38 2,245.26 2,212.30
> 4 5,689.66 5,953.86 5,984.80 5,827.94
>
> 1K:
> 1 7,804.38 7,236.29 6,716.58 7,485.09
> 4 7,722.42 8,070.38 7,700.45 7,856.76
>
> 4K:
> 1 8,976.14 9,026.77 9,147.32 9,095.58
> 4 7,532.25 7,410.80 7,683.81 7,524.94
>
> 16K:
> 1 8,991.61 9,045.10 9,124.58 9,238.34
> 4 7,406.10 7,626.81 7,711.62 7,345.37
>
> Here's the remote host-to-guest summary for 1 VM doing TCP_MAERTS with
> 256, 1K, 4K and 16K message size in Mbps:
>
> 256:
> Instances Base V0 V1 V2
> 1 1,165.69 1,181.92 1,152.20 1,104.68
> 4 2,580.46 2,545.22 2,436.30 2,601.74
>
> 1K:
> 1 2,393.34 2,457.22 2,128.86 2,258.92
> 4 7,152.57 7,606.60 8,004.64 7,576.85
>
> 4K:
> 1 9,258.93 8,505.06 9,309.78 9,215.05
> 4 9,374.20 9,363.48 9,372.53 9,352.00
>
> 16K:
> 1 9,244.70 9,287.72 9,298.60 9,322.28
> 4 9,380.02 9,347.50 9,377.46 9,372.98
>
> Here's the remote host-to-guest summary for 4 VMs doing TCP_STREAM with
> 256, 1K, 4K and 16K message size in Mbps:
>
> 256:
> Instances Base V0 V1 V2
> 1 9,392.37 9,390.74 9,395.58 9,392.46
> 4 9,394.24 9,394.46 9,395.42 9,394.05
>
> 1K:
> 1 9,396.34 9,397.46 9,396.64 9,443.26
> 4 9,397.14 9,402.25 9,398.67 9,391.09
>
> 4K:
> 1 9,397.16 9,398.07 9,397.30 9,396.33
> 4 9,395.64 9,400.25 9,397.54 9,397.75
>
> 16K:
> 1 9,396.58 9,397.01 9,397.58 9,397.70
> 4 9,399.15 9,400.02 9,399.66 9,400.16
>
>
> Here's the remote host-to-guest summary for 4 VMs doing TCP_MAERTS with
> 256, 1K, 4K and 16K message size in Mbps:
>
> 256:
> Instances Base V0 V1 V2
> 1 5,048.66 5,007.26 5,074.98 4,974.86
> 4 9,217.23 9,245.14 9,263.97 9,294.23
>
> 1K:
> 1 9,378.32 9,387.12 9,386.21 9,361.55
> 4 9,384.42 9,384.02 9,385.50 9,385.55
>
> 4K:
> 1 9,391.10 9,390.28 9,389.70 9,391.02
> 4 9,384.38 9,383.39 9,384.74 9,384.19
>
> 16K:
> 1 9,390.77 9,389.62 9,388.07 9,388.19
> 4 9,381.86 9,382.37 9,385.54 9,383.88
>
>
> Tom
>
> > There's also this on top:
> > virtio-net-limit-xmit-polling/v3 -> don't delay avail index
update
> > I don't think it's important to test this one, yet
> >
> > Userspace to use: event index work is not yet merged upstream
> > so the revision to use is still this:
> > git://git.kernel.org/pub/scm/linux/kernel/git/mst/qemu-kvm.git
> > virtio-net-event-idx-v3