thr3ads.net - Virtualization - virtio-blk: should num_vqs be limited by num_possible

If this information is useful, please help other people find it:
Share via:

Dongli Zhang

2019-Mar-13 03:26 UTC

virtio-blk: should num_vqs be limited by num_possible_cpus()?

On 3/13/19 1:33 AM, Cornelia Huck wrote:> On Tue, 12 Mar 2019 10:22:46 -0700 (PDT)
> Dongli Zhang <dongli.zhang at oracle.com> wrote:
> 
>> I observed that there is one msix vector for config and one shared
vector
>> for all queues in below qemu cmdline, when the num-queues for
virtio-blk
>> is more than the number of possible cpus:
>>
>> qemu: "-smp 4" while "-device
virtio-blk-pci,drive=drive-0,id=virtblk0,num-queues=6"
>>
>> # cat /proc/interrupts 
>>            CPU0       CPU1       CPU2       CPU3
>> ... ...
>>  24:          0          0          0          0   PCI-MSI 65536-edge  
virtio0-config
>>  25:          0          0          0         59   PCI-MSI 65537-edge  
virtio0-virtqueues
>> ... ...
>>
>>
>> However, when num-queues is the same as number of possible cpus:
>>
>> qemu: "-smp 4" while "-device
virtio-blk-pci,drive=drive-0,id=virtblk0,num-queues=4"
>>
>> # cat /proc/interrupts 
>>            CPU0       CPU1       CPU2       CPU3
>> ... ... 
>>  24:          0          0          0          0   PCI-MSI 65536-edge  
virtio0-config
>>  25:          2          0          0          0   PCI-MSI 65537-edge  
virtio0-req.0
>>  26:          0         35          0          0   PCI-MSI 65538-edge  
virtio0-req.1
>>  27:          0          0         32          0   PCI-MSI 65539-edge  
virtio0-req.2
>>  28:          0          0          0          0   PCI-MSI 65540-edge  
virtio0-req.3
>> ... ...
>>
>> In above case, there is one msix vector per queue.
> 
> Please note that this is pci-specific...
> 
>>
>>
>> This is because the max number of queues is not limited by the number
of
>> possible cpus.
>>
>> By default, nvme (regardless about write_queues and poll_queues) and
>> xen-blkfront limit the number of queues with num_possible_cpus().
> 
> ...and these are probably pci-specific as well.
Not pci-specific, but per-cpu as well.
> 
>>
>>
>> Is this by design on purpose, or can we fix with below?
>>
>>
>> diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
>> index 4bc083b..df95ce3 100644
>> --- a/drivers/block/virtio_blk.c
>> +++ b/drivers/block/virtio_blk.c
>> @@ -513,6 +513,8 @@ static int init_vq(struct virtio_blk *vblk)
>>  	if (err)
>>  		num_vqs = 1;
>>  
>> +	num_vqs = min(num_possible_cpus(), num_vqs);
>> +
>>  	vblk->vqs = kmalloc_array(num_vqs, sizeof(*vblk->vqs),
GFP_KERNEL);
>>  	if (!vblk->vqs)
>>  		return -ENOMEM;
> 
> virtio-blk, however, is not pci-specific.
> 
> If we are using the ccw transport on s390, a completely different
> interrupt mechanism is in use ('floating' interrupts, which are not
> per-cpu). A check like that should therefore not go into the generic
> driver.
> 
So far there seems two options.

The 1st option is to ask the qemu user to always specify "-num-queues"
with the
same number of vcpus when running x86 guest with pci for virtio-blk or
virtio-scsi, in order to assign a vector for each queue.

Or, is it fine for virtio folks to add a new hook to 'struct
virtio_config_ops'
so that different platforms (e.g., pci or ccw) would use different ways to limit
the max number of queues in guest, with something like below?

So far both virtio-scsi and virtio-blk would benefit from the new hook.

---
 drivers/block/virtio_blk.c         |  2 ++
 drivers/virtio/virtio_pci_common.c |  6 ++++++
 drivers/virtio/virtio_pci_common.h |  2 ++
 drivers/virtio/virtio_pci_legacy.c |  1 +
 drivers/virtio/virtio_pci_modern.c |  2 ++
 include/linux/virtio_config.h      | 11 +++++++++++
 6 files changed, 24 insertions(+)

diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index 4bc083b..93cfeda 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -513,6 +513,8 @@ static int init_vq(struct virtio_blk *vblk)
 	if (err)
 		num_vqs = 1;

+	num_vqs = virtio_calc_num_vqs(vdev, num_vqs);
+
 	vblk->vqs = kmalloc_array(num_vqs, sizeof(*vblk->vqs), GFP_KERNEL);
 	if (!vblk->vqs)
 		return -ENOMEM;
diff --git a/drivers/virtio/virtio_pci_common.c
b/drivers/virtio/virtio_pci_common.c
index d0584c0..ce021d1 100644
--- a/drivers/virtio/virtio_pci_common.c
+++ b/drivers/virtio/virtio_pci_common.c
@@ -409,6 +409,12 @@ int vp_find_vqs(struct virtio_device *vdev, unsigned nvqs,
 	return vp_find_vqs_intx(vdev, nvqs, vqs, callbacks, names, ctx);
 }

+/* the config->calc_num_vqs() implementation */
+unsigned short vp_calc_num_vqs(unsigned short num_vqs)
+{
+	return min_t(unsigned short, num_possible_cpus(), num_vqs);
+}
+
 const char *vp_bus_name(struct virtio_device *vdev)
 {
 	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
diff --git a/drivers/virtio/virtio_pci_common.h
b/drivers/virtio/virtio_pci_common.h
index 0227100..cc5ac80 100644
--- a/drivers/virtio/virtio_pci_common.h
+++ b/drivers/virtio/virtio_pci_common.h
@@ -134,6 +134,8 @@ int vp_find_vqs(struct virtio_device *vdev, unsigned nvqs,
 		struct virtqueue *vqs[], vq_callback_t *callbacks[],
 		const char * const names[], const bool *ctx,
 		struct irq_affinity *desc);
+/* the config->calc_num_vqs() implementation */
+unsigned short vp_calc_num_vqs(unsigned short num_vqs);
 const char *vp_bus_name(struct virtio_device *vdev);

 /* Setup the affinity for a virtqueue:
diff --git a/drivers/virtio/virtio_pci_legacy.c
b/drivers/virtio/virtio_pci_legacy.c
index eff9ddc..69d1050 100644
--- a/drivers/virtio/virtio_pci_legacy.c
+++ b/drivers/virtio/virtio_pci_legacy.c
@@ -209,6 +209,7 @@ static const struct virtio_config_ops virtio_pci_config_ops
= {
 	.bus_name	= vp_bus_name,
 	.set_vq_affinity = vp_set_vq_affinity,
 	.get_vq_affinity = vp_get_vq_affinity,
+	.calc_num_vqs = vp_calc_num_vqs,
 };

 /* the PCI probing function */
diff --git a/drivers/virtio/virtio_pci_modern.c
b/drivers/virtio/virtio_pci_modern.c
index 07571da..f04e44a 100644
--- a/drivers/virtio/virtio_pci_modern.c
+++ b/drivers/virtio/virtio_pci_modern.c
@@ -460,6 +460,7 @@ static const struct virtio_config_ops
virtio_pci_config_nodev_ops = {
 	.bus_name	= vp_bus_name,
 	.set_vq_affinity = vp_set_vq_affinity,
 	.get_vq_affinity = vp_get_vq_affinity,
+	.calc_num_vqs = vp_calc_num_vqs,
 };

 static const struct virtio_config_ops virtio_pci_config_ops = {
@@ -476,6 +477,7 @@ static const struct virtio_config_ops virtio_pci_config_ops
= {
 	.bus_name	= vp_bus_name,
 	.set_vq_affinity = vp_set_vq_affinity,
 	.get_vq_affinity = vp_get_vq_affinity,
+	.calc_num_vqs = vp_calc_num_vqs,
 };

 /**
diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
index bb4cc49..f32368b 100644
--- a/include/linux/virtio_config.h
+++ b/include/linux/virtio_config.h
@@ -65,6 +65,7 @@ struct irq_affinity;
  *      the caller can then copy.
  * @set_vq_affinity: set the affinity for a virtqueue (optional).
  * @get_vq_affinity: get the affinity for a virtqueue (optional).
+ * @calc_num_vqs: calculate the number of virtqueue (optional)
  */
 typedef void vq_callback_t(struct virtqueue *);
 struct virtio_config_ops {
@@ -88,6 +89,7 @@ struct virtio_config_ops {
 			       const struct cpumask *cpu_mask);
 	const struct cpumask *(*get_vq_affinity)(struct virtio_device *vdev,
 			int index);
+	unsigned short (*calc_num_vqs)(unsigned short num_vqs);
 };

 /* If driver didn't advertise the feature, it will never appear. */
@@ -207,6 +209,15 @@ int virtio_find_vqs_ctx(struct virtio_device *vdev,
unsigned nvqs,
 				      desc);
 }

+static inline
+unsigned short virtio_calc_num_vqs(struct virtio_device *vdev,
+				   unsigned short num_vqs)
+{
+	if (vdev->config->calc_num_vqs)
+		return vdev->config->calc_num_vqs(num_vqs);
+	return num_vqs;
+}
+
 /**
  * virtio_device_ready - enable vq use in probe function
  * @vdev: the device
-- 
2.7.4


Thank you very much!

Dongli Zhang

Cornelia Huck

2019-Mar-13 09:39 UTC

head link

virtio-blk: should num_vqs be limited by num_possible_cpus()?

On Wed, 13 Mar 2019 11:26:04 +0800
Dongli Zhang <dongli.zhang at oracle.com> wrote:
> On 3/13/19 1:33 AM, Cornelia Huck wrote:
> > On Tue, 12 Mar 2019 10:22:46 -0700 (PDT)
> > Dongli Zhang <dongli.zhang at oracle.com> wrote:
> >   
> >> I observed that there is one msix vector for config and one shared
vector
> >> for all queues in below qemu cmdline, when the num-queues for
virtio-blk
> >> is more than the number of possible cpus:
> >>
> >> qemu: "-smp 4" while "-device
virtio-blk-pci,drive=drive-0,id=virtblk0,num-queues=6"
> >>
> >> # cat /proc/interrupts 
> >>            CPU0       CPU1       CPU2       CPU3
> >> ... ...
> >>  24:          0          0          0          0   PCI-MSI
65536-edge      virtio0-config
> >>  25:          0          0          0         59   PCI-MSI
65537-edge      virtio0-virtqueues
> >> ... ...
> >>
> >>
> >> However, when num-queues is the same as number of possible cpus:
> >>
> >> qemu: "-smp 4" while "-device
virtio-blk-pci,drive=drive-0,id=virtblk0,num-queues=4"
> >>
> >> # cat /proc/interrupts 
> >>            CPU0       CPU1       CPU2       CPU3
> >> ... ... 
> >>  24:          0          0          0          0   PCI-MSI
65536-edge      virtio0-config
> >>  25:          2          0          0          0   PCI-MSI
65537-edge      virtio0-req.0
> >>  26:          0         35          0          0   PCI-MSI
65538-edge      virtio0-req.1
> >>  27:          0          0         32          0   PCI-MSI
65539-edge      virtio0-req.2
> >>  28:          0          0          0          0   PCI-MSI
65540-edge      virtio0-req.3
> >> ... ...
> >>
> >> In above case, there is one msix vector per queue.  
> > 
> > Please note that this is pci-specific...
> >   
> >>
> >>
> >> This is because the max number of queues is not limited by the
number of
> >> possible cpus.
> >>
> >> By default, nvme (regardless about write_queues and poll_queues)
and
> >> xen-blkfront limit the number of queues with num_possible_cpus().
> > 
> > ...and these are probably pci-specific as well.  
> 
> Not pci-specific, but per-cpu as well.
Ah, I meant that those are pci devices.
> 
> >   
> >>
> >>
> >> Is this by design on purpose, or can we fix with below?
> >>
> >>
> >> diff --git a/drivers/block/virtio_blk.c
b/drivers/block/virtio_blk.c
> >> index 4bc083b..df95ce3 100644
> >> --- a/drivers/block/virtio_blk.c
> >> +++ b/drivers/block/virtio_blk.c
> >> @@ -513,6 +513,8 @@ static int init_vq(struct virtio_blk *vblk)
> >>  	if (err)
> >>  		num_vqs = 1;
> >>  
> >> +	num_vqs = min(num_possible_cpus(), num_vqs);
> >> +
> >>  	vblk->vqs = kmalloc_array(num_vqs, sizeof(*vblk->vqs),
GFP_KERNEL);
> >>  	if (!vblk->vqs)
> >>  		return -ENOMEM;  
> > 
> > virtio-blk, however, is not pci-specific.
> > 
> > If we are using the ccw transport on s390, a completely different
> > interrupt mechanism is in use ('floating' interrupts, which
are not
> > per-cpu). A check like that should therefore not go into the generic
> > driver.
> >   
> 
> So far there seems two options.
> 
> The 1st option is to ask the qemu user to always specify
"-num-queues" with the
> same number of vcpus when running x86 guest with pci for virtio-blk or
> virtio-scsi, in order to assign a vector for each queue.
That does seem like an extra burden for the user: IIUC, things work
even if you have too many queues, it's just not optimal. It sounds like
something that can be done by a management layer (e.g. libvirt), though.
> Or, is it fine for virtio folks to add a new hook to 'struct
virtio_config_ops'
> so that different platforms (e.g., pci or ccw) would use different ways to
limit
> the max number of queues in guest, with something like below?
That sounds better, as both transports and drivers can opt-in here.

However, maybe it would be even better to try to come up with a better
strategy of allocating msix vectors in virtio-pci. More vectors in the
num_queues > num_cpus case, even if they still need to be shared?
Individual vectors for n-1 cpus and then a shared one for the remaining
queues?

It might even be device-specific: Have some low-traffic status queues
share a vector, and provide an individual vector for high-traffic
queues. Would need some device<->transport interface, obviously.

Dongli Zhang

2019-Mar-14 06:12 UTC

head link

virtio-blk: should num_vqs be limited by num_possible_cpus()?

On 3/13/19 5:39 PM, Cornelia Huck wrote:> On Wed, 13 Mar 2019 11:26:04 +0800
> Dongli Zhang <dongli.zhang at oracle.com> wrote:
> 
>> On 3/13/19 1:33 AM, Cornelia Huck wrote:
>>> On Tue, 12 Mar 2019 10:22:46 -0700 (PDT)
>>> Dongli Zhang <dongli.zhang at oracle.com> wrote:
>>>   
>>>> I observed that there is one msix vector for config and one
shared vector
>>>> for all queues in below qemu cmdline, when the num-queues for
virtio-blk
>>>> is more than the number of possible cpus:
>>>>
>>>> qemu: "-smp 4" while "-device
virtio-blk-pci,drive=drive-0,id=virtblk0,num-queues=6"
>>>>
>>>> # cat /proc/interrupts 
>>>>            CPU0       CPU1       CPU2       CPU3
>>>> ... ...
>>>>  24:          0          0          0          0   PCI-MSI
65536-edge      virtio0-config
>>>>  25:          0          0          0         59   PCI-MSI
65537-edge      virtio0-virtqueues
>>>> ... ...
>>>>
>>>>
>>>> However, when num-queues is the same as number of possible
cpus:
>>>>
>>>> qemu: "-smp 4" while "-device
virtio-blk-pci,drive=drive-0,id=virtblk0,num-queues=4"
>>>>
>>>> # cat /proc/interrupts 
>>>>            CPU0       CPU1       CPU2       CPU3
>>>> ... ... 
>>>>  24:          0          0          0          0   PCI-MSI
65536-edge      virtio0-config
>>>>  25:          2          0          0          0   PCI-MSI
65537-edge      virtio0-req.0
>>>>  26:          0         35          0          0   PCI-MSI
65538-edge      virtio0-req.1
>>>>  27:          0          0         32          0   PCI-MSI
65539-edge      virtio0-req.2
>>>>  28:          0          0          0          0   PCI-MSI
65540-edge      virtio0-req.3
>>>> ... ...
>>>>
>>>> In above case, there is one msix vector per queue.  
>>>
>>> Please note that this is pci-specific...
>>>   
>>>>
>>>>
>>>> This is because the max number of queues is not limited by the
number of
>>>> possible cpus.
>>>>
>>>> By default, nvme (regardless about write_queues and
poll_queues) and
>>>> xen-blkfront limit the number of queues with
num_possible_cpus().
>>>
>>> ...and these are probably pci-specific as well.  
>>
>> Not pci-specific, but per-cpu as well.
> 
> Ah, I meant that those are pci devices.
> 
>>
>>>   
>>>>
>>>>
>>>> Is this by design on purpose, or can we fix with below?
>>>>
>>>>
>>>> diff --git a/drivers/block/virtio_blk.c
b/drivers/block/virtio_blk.c
>>>> index 4bc083b..df95ce3 100644
>>>> --- a/drivers/block/virtio_blk.c
>>>> +++ b/drivers/block/virtio_blk.c
>>>> @@ -513,6 +513,8 @@ static int init_vq(struct virtio_blk *vblk)
>>>>  	if (err)
>>>>  		num_vqs = 1;
>>>>  
>>>> +	num_vqs = min(num_possible_cpus(), num_vqs);
>>>> +
>>>>  	vblk->vqs = kmalloc_array(num_vqs, sizeof(*vblk->vqs),
GFP_KERNEL);
>>>>  	if (!vblk->vqs)
>>>>  		return -ENOMEM;  
>>>
>>> virtio-blk, however, is not pci-specific.
>>>
>>> If we are using the ccw transport on s390, a completely different
>>> interrupt mechanism is in use ('floating' interrupts, which
are not
>>> per-cpu). A check like that should therefore not go into the
generic
>>> driver.
>>>   
>>
>> So far there seems two options.
>>
>> The 1st option is to ask the qemu user to always specify
"-num-queues" with the
>> same number of vcpus when running x86 guest with pci for virtio-blk or
>> virtio-scsi, in order to assign a vector for each queue.
> 
> That does seem like an extra burden for the user: IIUC, things work
> even if you have too many queues, it's just not optimal. It sounds like
> something that can be done by a management layer (e.g. libvirt), though.
> 
>> Or, is it fine for virtio folks to add a new hook to 'struct
virtio_config_ops'
>> so that different platforms (e.g., pci or ccw) would use different ways
to limit
>> the max number of queues in guest, with something like below?
> 
> That sounds better, as both transports and drivers can opt-in here.
> 
> However, maybe it would be even better to try to come up with a better
> strategy of allocating msix vectors in virtio-pci. More vectors in the
> num_queues > num_cpus case, even if they still need to be shared?
> Individual vectors for n-1 cpus and then a shared one for the remaining
> queues?
> 
> It might even be device-specific: Have some low-traffic status queues
> share a vector, and provide an individual vector for high-traffic
> queues. Would need some device<->transport interface, obviously.
> 
This sounds a little bit similar to multiple hctx maps?

So far, as virtio-blk only supports set->nr_maps = 1, no matter how many hw
queues are assigned for virtio-blk, blk_mq_alloc_tag_set() would use at most
nr_cpu_ids hw queues.

2981 int blk_mq_alloc_tag_set(struct blk_mq_tag_set *set)
... ...
3021         /*
3022          * There is no use for more h/w queues than cpus if we just have
3023          * a single map
3024          */
3025         if (set->nr_maps == 1 && set->nr_hw_queues >
nr_cpu_ids)
3026                 set->nr_hw_queues = nr_cpu_ids;

Even the block layer would limit the number of hw queues by nr_cpu_ids when
(set->nr_maps == 1).

That's why I think virtio-blk should use the similar solution as nvme
(regardless about write_queues and poll_queues) and xen-blkfront.

Added Jason again. I do not know why the mailing list of
virtualization at lists.linux-foundation.org always filters out Jason's
email...


Dongli Zhang

Maybe Matching Threads

Search for more possibly parallel threads

Virtualization - Mar 2019 - virtio-blk: should num_vqs be limited by num_possible_cpus()?

virtio-blk: should num_vqs be limited by num_possible_cpus()?

virtio-blk: should num_vqs be limited by num_possible_cpus()?

virtio-blk: should num_vqs be limited by num_possible_cpus()?

Maybe Matching Threads