On Sun, Jun 28, 2020 at 02:34:37PM +0800, Jason Wang wrote:> > On 2020/6/25 ??9:57, Stefan Hajnoczi wrote: > > These patches are not ready to be merged because I was unable to measure a > > performance improvement. I'm publishing them so they are archived in case > > someone picks up this work again in the future. > > > > The goal of these patches is to allocate virtqueues and driver state from the > > device's NUMA node for optimal memory access latency. Only guests with a vNUMA > > topology and virtio devices spread across vNUMA nodes benefit from this. In > > other cases the memory placement is fine and we don't need to take NUMA into > > account inside the guest. > > > > These patches could be extended to virtio_net.ko and other devices in the > > future. I only tested virtio_blk.ko. > > > > The benchmark configuration was designed to trigger worst-case NUMA placement: > > * Physical NVMe storage controller on host NUMA node 0 > > * IOThread pinned to host NUMA node 0 > > * virtio-blk-pci device in vNUMA node 1 > > * vCPU 0 on host NUMA node 1 and vCPU 1 on host NUMA node 0 > > * vCPU 0 in vNUMA node 0 and vCPU 1 in vNUMA node 1 > > > > The intent is to have .probe() code run on vCPU 0 in vNUMA node 0 (host NUMA > > node 1) so that memory is in the wrong NUMA node for the virtio-blk-pci devic> > e. > > Applying these patches fixes memory placement so that virtqueues and driver > > state is allocated in vNUMA node 1 where the virtio-blk-pci device is located. > > > > The fio 4KB randread benchmark results do not show a significant improvement: > > > > Name IOPS Error > > virtio-blk 42373.79 =C2=B1 0.54% > > virtio-blk-numa 42517.07 =C2=B1 0.79% > > > I remember I did something similar in vhost by using page_to_nid() for > descriptor ring. And I get little improvement as shown here. > > Michael reminds that it was probably because all data were cached. So I > doubt if the test lacks sufficient stress on the cache ...Yes, that sounds likely. If there's no real-world performance improvement then I'm happy to leave these patches unmerged. Stefan -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 488 bytes Desc: not available URL: <http://lists.linuxfoundation.org/pipermail/virtualization/attachments/20200629/5e4ba158/attachment.sig>
On Mon, Jun 29, 2020 at 10:26:46AM +0100, Stefan Hajnoczi wrote:> On Sun, Jun 28, 2020 at 02:34:37PM +0800, Jason Wang wrote: > > > > On 2020/6/25 ??9:57, Stefan Hajnoczi wrote: > > > These patches are not ready to be merged because I was unable to measure a > > > performance improvement. I'm publishing them so they are archived in case > > > someone picks up this work again in the future. > > > > > > The goal of these patches is to allocate virtqueues and driver state from the > > > device's NUMA node for optimal memory access latency. Only guests with a vNUMA > > > topology and virtio devices spread across vNUMA nodes benefit from this. In > > > other cases the memory placement is fine and we don't need to take NUMA into > > > account inside the guest. > > > > > > These patches could be extended to virtio_net.ko and other devices in the > > > future. I only tested virtio_blk.ko. > > > > > > The benchmark configuration was designed to trigger worst-case NUMA placement: > > > * Physical NVMe storage controller on host NUMA node 0It's possible that numa is not such a big deal for NVMe. And it's possible that bios misconfigures ACPI reporting NUMA placement incorrectly. I think that the best thing to try is to use a ramdisk on a specific numa node.> > > * IOThread pinned to host NUMA node 0 > > > * virtio-blk-pci device in vNUMA node 1 > > > * vCPU 0 on host NUMA node 1 and vCPU 1 on host NUMA node 0 > > > * vCPU 0 in vNUMA node 0 and vCPU 1 in vNUMA node 1 > > > > > > The intent is to have .probe() code run on vCPU 0 in vNUMA node 0 (host NUMA > > > node 1) so that memory is in the wrong NUMA node for the virtio-blk-pci devic> > > e. > > > Applying these patches fixes memory placement so that virtqueues and driver > > > state is allocated in vNUMA node 1 where the virtio-blk-pci device is located. > > > > > > The fio 4KB randread benchmark results do not show a significant improvement: > > > > > > Name IOPS Error > > > virtio-blk 42373.79 =C2=B1 0.54% > > > virtio-blk-numa 42517.07 =C2=B1 0.79% > > > > > > I remember I did something similar in vhost by using page_to_nid() for > > descriptor ring. And I get little improvement as shown here. > > > > Michael reminds that it was probably because all data were cached. So I > > doubt if the test lacks sufficient stress on the cache ... > > Yes, that sounds likely. If there's no real-world performance > improvement then I'm happy to leave these patches unmerged. > > StefanWell that was for vhost though. This is virtio, which is different. Doesn't some benchmark put pressure on the CPU cache? I kind of feel there should be a difference, and the fact there isn't means there's some other bottleneck somewhere. Might be worth figuring out. -- MST
On Mon, Jun 29, 2020 at 11:28:41AM -0400, Michael S. Tsirkin wrote:> On Mon, Jun 29, 2020 at 10:26:46AM +0100, Stefan Hajnoczi wrote: > > On Sun, Jun 28, 2020 at 02:34:37PM +0800, Jason Wang wrote: > > > > > > On 2020/6/25 ??9:57, Stefan Hajnoczi wrote: > > > > These patches are not ready to be merged because I was unable to measure a > > > > performance improvement. I'm publishing them so they are archived in case > > > > someone picks up this work again in the future. > > > > > > > > The goal of these patches is to allocate virtqueues and driver state from the > > > > device's NUMA node for optimal memory access latency. Only guests with a vNUMA > > > > topology and virtio devices spread across vNUMA nodes benefit from this. In > > > > other cases the memory placement is fine and we don't need to take NUMA into > > > > account inside the guest. > > > > > > > > These patches could be extended to virtio_net.ko and other devices in the > > > > future. I only tested virtio_blk.ko. > > > > > > > > The benchmark configuration was designed to trigger worst-case NUMA placement: > > > > * Physical NVMe storage controller on host NUMA node 0 > > It's possible that numa is not such a big deal for NVMe. > And it's possible that bios misconfigures ACPI reporting NUMA placement > incorrectly. > I think that the best thing to try is to use a ramdisk > on a specific numa node.Using ramdisk is an interesting idea, thanks. Stefan -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 488 bytes Desc: not available URL: <http://lists.linuxfoundation.org/pipermail/virtualization/attachments/20200630/ead40eaf/attachment.sig>