Ming Lei
2014-May-30 03:34 UTC
[PATCH] block: virtio_blk: don't hold spin lock during world switch
On Fri, May 30, 2014 at 11:19 AM, Jens Axboe <axboe at kernel.dk> wrote:> On 2014-05-29 20:49, Ming Lei wrote: >> >> Firstly, it isn't necessary to hold lock of vblk->vq_lock >> when notifying hypervisor about queued I/O. >> >> Secondly, virtqueue_notify() will cause world switch and >> it may take long time on some hypervisors(such as, qemu-arm), >> so it isn't good to hold the lock and block other vCPUs. >> >> On arm64 quad core VM(qemu-kvm), the patch can increase I/O >> performance a lot with VIRTIO_RING_F_EVENT_IDX enabled: >> - without the patch: 14K IOPS >> - with the patch: 34K IOPS > > > Patch looks good to me. I don't see a hit on my qemu-kvm testing, but it > definitely makes sense and I can see it hurting in other places.It isn't easy to observe the improvement on x86 VM, especially with few vCPUs, because qemu-system-x86_64 only takes several microseconds to handle the notification, but on arm64, it may take hundreds of microseconds, so the improvement is obvious on arm VM. I hope this patch can be merged, at least arm VM can benefit from it. Thanks, -- Ming Lei
Jens Axboe
2014-May-30 03:35 UTC
[PATCH] block: virtio_blk: don't hold spin lock during world switch
On 2014-05-29 21:34, Ming Lei wrote:> On Fri, May 30, 2014 at 11:19 AM, Jens Axboe <axboe at kernel.dk> wrote: >> On 2014-05-29 20:49, Ming Lei wrote: >>> >>> Firstly, it isn't necessary to hold lock of vblk->vq_lock >>> when notifying hypervisor about queued I/O. >>> >>> Secondly, virtqueue_notify() will cause world switch and >>> it may take long time on some hypervisors(such as, qemu-arm), >>> so it isn't good to hold the lock and block other vCPUs. >>> >>> On arm64 quad core VM(qemu-kvm), the patch can increase I/O >>> performance a lot with VIRTIO_RING_F_EVENT_IDX enabled: >>> - without the patch: 14K IOPS >>> - with the patch: 34K IOPS >> >> >> Patch looks good to me. I don't see a hit on my qemu-kvm testing, but it >> definitely makes sense and I can see it hurting in other places. > > It isn't easy to observe the improvement on x86 VM, especially > with few vCPUs, because qemu-system-x86_64 only takes > several microseconds to handle the notification, but on arm64, it > may take hundreds of microseconds, so the improvement is > obvious on arm VM. > > I hope this patch can be merged, at least arm VM can benefit > from it.If Rusty agrees, I'd like to add it for 3.16 with a stable marker. -- Jens Axboe
Ming Lei
2014-May-30 05:58 UTC
[PATCH] block: virtio_blk: don't hold spin lock during world switch
On Fri, May 30, 2014 at 11:35 AM, Jens Axboe <axboe at kernel.dk> wrote:> On 2014-05-29 21:34, Ming Lei wrote: >> >> On Fri, May 30, 2014 at 11:19 AM, Jens Axboe <axboe at kernel.dk> wrote: >>> >>> On 2014-05-29 20:49, Ming Lei wrote: >>>> >>>> >>>> Firstly, it isn't necessary to hold lock of vblk->vq_lock >>>> when notifying hypervisor about queued I/O. >>>> >>>> Secondly, virtqueue_notify() will cause world switch and >>>> it may take long time on some hypervisors(such as, qemu-arm), >>>> so it isn't good to hold the lock and block other vCPUs. >>>> >>>> On arm64 quad core VM(qemu-kvm), the patch can increase I/O >>>> performance a lot with VIRTIO_RING_F_EVENT_IDX enabled: >>>> - without the patch: 14K IOPS >>>> - with the patch: 34K IOPS >>> >>> >>> >>> Patch looks good to me. I don't see a hit on my qemu-kvm testing, but it >>> definitely makes sense and I can see it hurting in other places. >> >> >> It isn't easy to observe the improvement on x86 VM, especially >> with few vCPUs, because qemu-system-x86_64 only takes >> several microseconds to handle the notification, but on arm64, it >> may take hundreds of microseconds, so the improvement is >> obvious on arm VM. >> >> I hope this patch can be merged, at least arm VM can benefit >> from it. > > > If Rusty agrees, I'd like to add it for 3.16 with a stable marker.Interesting, even on x86, I still can observe the improvement when the numjobs is set as 2 in the fio script(see commit log), but when numjobs is set as 4, 8, 12, the difference isn't obvious between patched kernel and non-patched kernel. 1, environment - host: 2sockets, each CPU(4cores, 2 threads), total 16 logical cores - guest: 16cores, 8GB ram - guest kernel: 3.15-rc7-next with patch[1] - fio: the script in commit log with numjobs set as 2 2, result - without the patch: ~104K IOPS - with the patch: ~140K IOPS Rusty, considered the same trick has been applied in virt-scsi, do you agree to take the same approach in virt-blk too? [1], http://marc.info/?l=linux-kernel&m=140135041423441&w=2 Thanks, -- Ming Lei
Rusty Russell
2014-May-30 06:10 UTC
[PATCH] block: virtio_blk: don't hold spin lock during world switch
Jens Axboe <axboe at kernel.dk> writes:> If Rusty agrees, I'd like to add it for 3.16 with a stable marker.Really stable? It improves performance, which is nice. But every patch which goes into the kernel fixes a bug, improves clarity, improves performance or adds a feature. I've now seen all four cases get CC'd into stable. Including some of mine explicitly not marked stable which get swept up by enthusiastic stable maintainers :( Is now there *any* patch short of a major rewrite which shouldn't get cc: stable? Cheers, Rusty.
Possibly Parallel Threads
- [PATCH] block: virtio_blk: don't hold spin lock during world switch
- [PATCH] block: virtio_blk: don't hold spin lock during world switch
- [PATCH] block: virtio_blk: don't hold spin lock during world switch
- [PATCH] block: virtio_blk: don't hold spin lock during world switch
- [PATCH] block: virtio_blk: don't hold spin lock during world switch