Dongli Zhang
2019-Mar-14 06:12 UTC
virtio-blk: should num_vqs be limited by num_possible_cpus()?
On 3/13/19 5:39 PM, Cornelia Huck wrote:> On Wed, 13 Mar 2019 11:26:04 +0800 > Dongli Zhang <dongli.zhang at oracle.com> wrote: > >> On 3/13/19 1:33 AM, Cornelia Huck wrote: >>> On Tue, 12 Mar 2019 10:22:46 -0700 (PDT) >>> Dongli Zhang <dongli.zhang at oracle.com> wrote: >>> >>>> I observed that there is one msix vector for config and one shared vector >>>> for all queues in below qemu cmdline, when the num-queues for virtio-blk >>>> is more than the number of possible cpus: >>>> >>>> qemu: "-smp 4" while "-device virtio-blk-pci,drive=drive-0,id=virtblk0,num-queues=6" >>>> >>>> # cat /proc/interrupts >>>> CPU0 CPU1 CPU2 CPU3 >>>> ... ... >>>> 24: 0 0 0 0 PCI-MSI 65536-edge virtio0-config >>>> 25: 0 0 0 59 PCI-MSI 65537-edge virtio0-virtqueues >>>> ... ... >>>> >>>> >>>> However, when num-queues is the same as number of possible cpus: >>>> >>>> qemu: "-smp 4" while "-device virtio-blk-pci,drive=drive-0,id=virtblk0,num-queues=4" >>>> >>>> # cat /proc/interrupts >>>> CPU0 CPU1 CPU2 CPU3 >>>> ... ... >>>> 24: 0 0 0 0 PCI-MSI 65536-edge virtio0-config >>>> 25: 2 0 0 0 PCI-MSI 65537-edge virtio0-req.0 >>>> 26: 0 35 0 0 PCI-MSI 65538-edge virtio0-req.1 >>>> 27: 0 0 32 0 PCI-MSI 65539-edge virtio0-req.2 >>>> 28: 0 0 0 0 PCI-MSI 65540-edge virtio0-req.3 >>>> ... ... >>>> >>>> In above case, there is one msix vector per queue. >>> >>> Please note that this is pci-specific... >>> >>>> >>>> >>>> This is because the max number of queues is not limited by the number of >>>> possible cpus. >>>> >>>> By default, nvme (regardless about write_queues and poll_queues) and >>>> xen-blkfront limit the number of queues with num_possible_cpus(). >>> >>> ...and these are probably pci-specific as well. >> >> Not pci-specific, but per-cpu as well. > > Ah, I meant that those are pci devices. > >> >>> >>>> >>>> >>>> Is this by design on purpose, or can we fix with below? >>>> >>>> >>>> diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c >>>> index 4bc083b..df95ce3 100644 >>>> --- a/drivers/block/virtio_blk.c >>>> +++ b/drivers/block/virtio_blk.c >>>> @@ -513,6 +513,8 @@ static int init_vq(struct virtio_blk *vblk) >>>> if (err) >>>> num_vqs = 1; >>>> >>>> + num_vqs = min(num_possible_cpus(), num_vqs); >>>> + >>>> vblk->vqs = kmalloc_array(num_vqs, sizeof(*vblk->vqs), GFP_KERNEL); >>>> if (!vblk->vqs) >>>> return -ENOMEM; >>> >>> virtio-blk, however, is not pci-specific. >>> >>> If we are using the ccw transport on s390, a completely different >>> interrupt mechanism is in use ('floating' interrupts, which are not >>> per-cpu). A check like that should therefore not go into the generic >>> driver. >>> >> >> So far there seems two options. >> >> The 1st option is to ask the qemu user to always specify "-num-queues" with the >> same number of vcpus when running x86 guest with pci for virtio-blk or >> virtio-scsi, in order to assign a vector for each queue. > > That does seem like an extra burden for the user: IIUC, things work > even if you have too many queues, it's just not optimal. It sounds like > something that can be done by a management layer (e.g. libvirt), though. > >> Or, is it fine for virtio folks to add a new hook to 'struct virtio_config_ops' >> so that different platforms (e.g., pci or ccw) would use different ways to limit >> the max number of queues in guest, with something like below? > > That sounds better, as both transports and drivers can opt-in here. > > However, maybe it would be even better to try to come up with a better > strategy of allocating msix vectors in virtio-pci. More vectors in the > num_queues > num_cpus case, even if they still need to be shared? > Individual vectors for n-1 cpus and then a shared one for the remaining > queues? > > It might even be device-specific: Have some low-traffic status queues > share a vector, and provide an individual vector for high-traffic > queues. Would need some device<->transport interface, obviously. >This sounds a little bit similar to multiple hctx maps? So far, as virtio-blk only supports set->nr_maps = 1, no matter how many hw queues are assigned for virtio-blk, blk_mq_alloc_tag_set() would use at most nr_cpu_ids hw queues. 2981 int blk_mq_alloc_tag_set(struct blk_mq_tag_set *set) ... ... 3021 /* 3022 * There is no use for more h/w queues than cpus if we just have 3023 * a single map 3024 */ 3025 if (set->nr_maps == 1 && set->nr_hw_queues > nr_cpu_ids) 3026 set->nr_hw_queues = nr_cpu_ids; Even the block layer would limit the number of hw queues by nr_cpu_ids when (set->nr_maps == 1). That's why I think virtio-blk should use the similar solution as nvme (regardless about write_queues and poll_queues) and xen-blkfront. Added Jason again. I do not know why the mailing list of virtualization at lists.linux-foundation.org always filters out Jason's email... Dongli Zhang
Cornelia Huck
2019-Mar-14 12:13 UTC
virtio-blk: should num_vqs be limited by num_possible_cpus()?
On Thu, 14 Mar 2019 14:12:32 +0800 Dongli Zhang <dongli.zhang at oracle.com> wrote:> On 3/13/19 5:39 PM, Cornelia Huck wrote: > > On Wed, 13 Mar 2019 11:26:04 +0800 > > Dongli Zhang <dongli.zhang at oracle.com> wrote: > > > >> On 3/13/19 1:33 AM, Cornelia Huck wrote: > >>> On Tue, 12 Mar 2019 10:22:46 -0700 (PDT) > >>> Dongli Zhang <dongli.zhang at oracle.com> wrote:> >>>> Is this by design on purpose, or can we fix with below? > >>>> > >>>> > >>>> diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c > >>>> index 4bc083b..df95ce3 100644 > >>>> --- a/drivers/block/virtio_blk.c > >>>> +++ b/drivers/block/virtio_blk.c > >>>> @@ -513,6 +513,8 @@ static int init_vq(struct virtio_blk *vblk) > >>>> if (err) > >>>> num_vqs = 1; > >>>> > >>>> + num_vqs = min(num_possible_cpus(), num_vqs); > >>>> + > >>>> vblk->vqs = kmalloc_array(num_vqs, sizeof(*vblk->vqs), GFP_KERNEL); > >>>> if (!vblk->vqs) > >>>> return -ENOMEM; > >>> > >>> virtio-blk, however, is not pci-specific. > >>> > >>> If we are using the ccw transport on s390, a completely different > >>> interrupt mechanism is in use ('floating' interrupts, which are not > >>> per-cpu). A check like that should therefore not go into the generic > >>> driver. > >>> > >> > >> So far there seems two options. > >> > >> The 1st option is to ask the qemu user to always specify "-num-queues" with the > >> same number of vcpus when running x86 guest with pci for virtio-blk or > >> virtio-scsi, in order to assign a vector for each queue. > > > > That does seem like an extra burden for the user: IIUC, things work > > even if you have too many queues, it's just not optimal. It sounds like > > something that can be done by a management layer (e.g. libvirt), though. > > > >> Or, is it fine for virtio folks to add a new hook to 'struct virtio_config_ops' > >> so that different platforms (e.g., pci or ccw) would use different ways to limit > >> the max number of queues in guest, with something like below? > > > > That sounds better, as both transports and drivers can opt-in here. > > > > However, maybe it would be even better to try to come up with a better > > strategy of allocating msix vectors in virtio-pci. More vectors in the > > num_queues > num_cpus case, even if they still need to be shared? > > Individual vectors for n-1 cpus and then a shared one for the remaining > > queues? > > > > It might even be device-specific: Have some low-traffic status queues > > share a vector, and provide an individual vector for high-traffic > > queues. Would need some device<->transport interface, obviously. > > > > This sounds a little bit similar to multiple hctx maps? > > So far, as virtio-blk only supports set->nr_maps = 1, no matter how many hw > queues are assigned for virtio-blk, blk_mq_alloc_tag_set() would use at most > nr_cpu_ids hw queues. > > 2981 int blk_mq_alloc_tag_set(struct blk_mq_tag_set *set) > ... ... > 3021 /* > 3022 * There is no use for more h/w queues than cpus if we just have > 3023 * a single map > 3024 */ > 3025 if (set->nr_maps == 1 && set->nr_hw_queues > nr_cpu_ids) > 3026 set->nr_hw_queues = nr_cpu_ids; > > Even the block layer would limit the number of hw queues by nr_cpu_ids when > (set->nr_maps == 1).Correct me if I'm wrong, but there seem to be two kinds of limitations involved here: - Allocation of msix vectors by the virtio-pci transport. We end up with shared vectors if we have more virtqueues than vcpus. Other transports may or may not have similar issues, but essentially, this is something that applies to all kind of virtio devices attached via the virtio-pci transport. - The block layer limits the number of hw queues to the number of vcpus. This applies only to virtio devices that interact with the block layer, but regardless of the virtio transport.> That's why I think virtio-blk should use the similar solution as nvme > (regardless about write_queues and poll_queues) and xen-blkfront.Ok, the hw queues limit from above would be an argument to limit to #vcpus in the virtio-blk driver, regardless of the transport used. (No idea if there are better ways to deal with this, I'm not familiar with the interface.) For virtio devices that don't interact with the block layer and are attached via the virtio-pci transport, it might still make sense to revisit vector allocation.
Dongli Zhang
2019-Mar-14 16:08 UTC
virtio-blk: should num_vqs be limited by num_possible_cpus()?
On 03/14/2019 08:13 PM, Cornelia Huck wrote:> On Thu, 14 Mar 2019 14:12:32 +0800 > Dongli Zhang <dongli.zhang at oracle.com> wrote: > >> On 3/13/19 5:39 PM, Cornelia Huck wrote: >>> On Wed, 13 Mar 2019 11:26:04 +0800 >>> Dongli Zhang <dongli.zhang at oracle.com> wrote: >>> >>>> On 3/13/19 1:33 AM, Cornelia Huck wrote: >>>>> On Tue, 12 Mar 2019 10:22:46 -0700 (PDT) >>>>> Dongli Zhang <dongli.zhang at oracle.com> wrote: > >>>>>> Is this by design on purpose, or can we fix with below? >>>>>> >>>>>> >>>>>> diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c >>>>>> index 4bc083b..df95ce3 100644 >>>>>> --- a/drivers/block/virtio_blk.c >>>>>> +++ b/drivers/block/virtio_blk.c >>>>>> @@ -513,6 +513,8 @@ static int init_vq(struct virtio_blk *vblk) >>>>>> if (err) >>>>>> num_vqs = 1; >>>>>> >>>>>> + num_vqs = min(num_possible_cpus(), num_vqs); >>>>>> + >>>>>> vblk->vqs = kmalloc_array(num_vqs, sizeof(*vblk->vqs), GFP_KERNEL); >>>>>> if (!vblk->vqs) >>>>>> return -ENOMEM; >>>>> >>>>> virtio-blk, however, is not pci-specific. >>>>> >>>>> If we are using the ccw transport on s390, a completely different >>>>> interrupt mechanism is in use ('floating' interrupts, which are not >>>>> per-cpu). A check like that should therefore not go into the generic >>>>> driver. >>>>> >>>> >>>> So far there seems two options. >>>> >>>> The 1st option is to ask the qemu user to always specify "-num-queues" with the >>>> same number of vcpus when running x86 guest with pci for virtio-blk or >>>> virtio-scsi, in order to assign a vector for each queue. >>> >>> That does seem like an extra burden for the user: IIUC, things work >>> even if you have too many queues, it's just not optimal. It sounds like >>> something that can be done by a management layer (e.g. libvirt), though. >>> >>>> Or, is it fine for virtio folks to add a new hook to 'struct virtio_config_ops' >>>> so that different platforms (e.g., pci or ccw) would use different ways to limit >>>> the max number of queues in guest, with something like below? >>> >>> That sounds better, as both transports and drivers can opt-in here. >>> >>> However, maybe it would be even better to try to come up with a better >>> strategy of allocating msix vectors in virtio-pci. More vectors in the >>> num_queues > num_cpus case, even if they still need to be shared? >>> Individual vectors for n-1 cpus and then a shared one for the remaining >>> queues? >>> >>> It might even be device-specific: Have some low-traffic status queues >>> share a vector, and provide an individual vector for high-traffic >>> queues. Would need some device<->transport interface, obviously. >>> >> >> This sounds a little bit similar to multiple hctx maps? >> >> So far, as virtio-blk only supports set->nr_maps = 1, no matter how many hw >> queues are assigned for virtio-blk, blk_mq_alloc_tag_set() would use at most >> nr_cpu_ids hw queues. >> >> 2981 int blk_mq_alloc_tag_set(struct blk_mq_tag_set *set) >> ... ... >> 3021 /* >> 3022 * There is no use for more h/w queues than cpus if we just have >> 3023 * a single map >> 3024 */ >> 3025 if (set->nr_maps == 1 && set->nr_hw_queues > nr_cpu_ids) >> 3026 set->nr_hw_queues = nr_cpu_ids; >> >> Even the block layer would limit the number of hw queues by nr_cpu_ids when >> (set->nr_maps == 1). > > Correct me if I'm wrong, but there seem to be two kinds of limitations > involved here: > - Allocation of msix vectors by the virtio-pci transport. We end up > with shared vectors if we have more virtqueues than vcpus. Other > transports may or may not have similar issues, but essentially, this > is something that applies to all kind of virtio devices attached via > the virtio-pci transport.It depends. For virtio-net, we need to specify the number of available vectors on qemu side, e.g.,: -device virtio-net-pci,netdev=tapnet,mq=true,vectors=16 This parameter is specific for virtio-net. Suppose if 'queues=8' while 'vectors=16', as 2*8+1 > 16, there will be lack of vectors and guest would not be able to assign one vector for each queue. I was tortured by this long time ago and it seems qemu would minimize the memory allocation and the default 'vectors' is 3. BTW, why cannot we have a more consistent configuration for most qemu devices, e.g., so far: virtio-blk use 'num-queues' nvme use 'num_queues' virtio-net use 'queues' for tap :)> - The block layer limits the number of hw queues to the number of > vcpus. This applies only to virtio devices that interact with the > block layer, but regardless of the virtio transport.Yes: virtio-blk and virtio-scsi.> >> That's why I think virtio-blk should use the similar solution as nvme >> (regardless about write_queues and poll_queues) and xen-blkfront. > > Ok, the hw queues limit from above would be an argument to limit to > #vcpus in the virtio-blk driver, regardless of the transport used. (No > idea if there are better ways to deal with this, I'm not familiar with > the interface.) > > For virtio devices that don't interact with the block layer and are > attached via the virtio-pci transport, it might still make sense to > revisit vector allocation. >As mentioned above, we need to specify 'vectors' for virtio-net as the default value is only 3 (config + tx + rx). That would make a little difference? Dongli Zhang
Jason Wang
2019-Mar-15 04:50 UTC
virtio-blk: should num_vqs be limited by num_possible_cpus()?
On 2019/3/14 ??2:12, Dongli Zhang wrote:> > On 3/13/19 5:39 PM, Cornelia Huck wrote: >> On Wed, 13 Mar 2019 11:26:04 +0800 >> Dongli Zhang <dongli.zhang at oracle.com> wrote: >> >>> On 3/13/19 1:33 AM, Cornelia Huck wrote: >>>> On Tue, 12 Mar 2019 10:22:46 -0700 (PDT) >>>> Dongli Zhang <dongli.zhang at oracle.com> wrote: >>>> >>>>> I observed that there is one msix vector for config and one shared vector >>>>> for all queues in below qemu cmdline, when the num-queues for virtio-blk >>>>> is more than the number of possible cpus: >>>>> >>>>> qemu: "-smp 4" while "-device virtio-blk-pci,drive=drive-0,id=virtblk0,num-queues=6" >>>>> >>>>> # cat /proc/interrupts >>>>> CPU0 CPU1 CPU2 CPU3 >>>>> ... ... >>>>> 24: 0 0 0 0 PCI-MSI 65536-edge virtio0-config >>>>> 25: 0 0 0 59 PCI-MSI 65537-edge virtio0-virtqueues >>>>> ... ... >>>>> >>>>> >>>>> However, when num-queues is the same as number of possible cpus: >>>>> >>>>> qemu: "-smp 4" while "-device virtio-blk-pci,drive=drive-0,id=virtblk0,num-queues=4" >>>>> >>>>> # cat /proc/interrupts >>>>> CPU0 CPU1 CPU2 CPU3 >>>>> ... ... >>>>> 24: 0 0 0 0 PCI-MSI 65536-edge virtio0-config >>>>> 25: 2 0 0 0 PCI-MSI 65537-edge virtio0-req.0 >>>>> 26: 0 35 0 0 PCI-MSI 65538-edge virtio0-req.1 >>>>> 27: 0 0 32 0 PCI-MSI 65539-edge virtio0-req.2 >>>>> 28: 0 0 0 0 PCI-MSI 65540-edge virtio0-req.3 >>>>> ... ... >>>>> >>>>> In above case, there is one msix vector per queue. >>>> Please note that this is pci-specific... >>>> >>>>> >>>>> This is because the max number of queues is not limited by the number of >>>>> possible cpus. >>>>> >>>>> By default, nvme (regardless about write_queues and poll_queues) and >>>>> xen-blkfront limit the number of queues with num_possible_cpus(). >>>> ...and these are probably pci-specific as well. >>> Not pci-specific, but per-cpu as well. >> Ah, I meant that those are pci devices. >> >>>> >>>>> >>>>> Is this by design on purpose, or can we fix with below? >>>>> >>>>> >>>>> diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c >>>>> index 4bc083b..df95ce3 100644 >>>>> --- a/drivers/block/virtio_blk.c >>>>> +++ b/drivers/block/virtio_blk.c >>>>> @@ -513,6 +513,8 @@ static int init_vq(struct virtio_blk *vblk) >>>>> if (err) >>>>> num_vqs = 1; >>>>> >>>>> + num_vqs = min(num_possible_cpus(), num_vqs); >>>>> + >>>>> vblk->vqs = kmalloc_array(num_vqs, sizeof(*vblk->vqs), GFP_KERNEL); >>>>> if (!vblk->vqs) >>>>> return -ENOMEM; >>>> virtio-blk, however, is not pci-specific. >>>> >>>> If we are using the ccw transport on s390, a completely different >>>> interrupt mechanism is in use ('floating' interrupts, which are not >>>> per-cpu). A check like that should therefore not go into the generic >>>> driver. >>>> >>> So far there seems two options. >>> >>> The 1st option is to ask the qemu user to always specify "-num-queues" with the >>> same number of vcpus when running x86 guest with pci for virtio-blk or >>> virtio-scsi, in order to assign a vector for each queue. >> That does seem like an extra burden for the user: IIUC, things work >> even if you have too many queues, it's just not optimal. It sounds like >> something that can be done by a management layer (e.g. libvirt), though. >> >>> Or, is it fine for virtio folks to add a new hook to 'struct virtio_config_ops' >>> so that different platforms (e.g., pci or ccw) would use different ways to limit >>> the max number of queues in guest, with something like below? >> That sounds better, as both transports and drivers can opt-in here. >> >> However, maybe it would be even better to try to come up with a better >> strategy of allocating msix vectors in virtio-pci. More vectors in the >> num_queues > num_cpus case, even if they still need to be shared? >> Individual vectors for n-1 cpus and then a shared one for the remaining >> queues? >> >> It might even be device-specific: Have some low-traffic status queues >> share a vector, and provide an individual vector for high-traffic >> queues. Would need some device<->transport interface, obviously. >> > This sounds a little bit similar to multiple hctx maps? > > So far, as virtio-blk only supports set->nr_maps = 1, no matter how many hw > queues are assigned for virtio-blk, blk_mq_alloc_tag_set() would use at most > nr_cpu_ids hw queues. > > 2981 int blk_mq_alloc_tag_set(struct blk_mq_tag_set *set) > ... ... > 3021 /* > 3022 * There is no use for more h/w queues than cpus if we just have > 3023 * a single map > 3024 */ > 3025 if (set->nr_maps == 1 && set->nr_hw_queues > nr_cpu_ids) > 3026 set->nr_hw_queues = nr_cpu_ids; > > Even the block layer would limit the number of hw queues by nr_cpu_ids when > (set->nr_maps == 1). > > That's why I think virtio-blk should use the similar solution as nvme > (regardless about write_queues and poll_queues) and xen-blkfront. > > Added Jason again. I do not know why the mailing list of > virtualization at lists.linux-foundation.org always filters out Jason's email... > > > Dongli ZhangOr something like I proposed several years ago? https://do-db2.lkml.org/lkml/2014/12/25/169 Btw, for virtio-net, I think we actually want to go for having a maximum number of supported queues like what hardware did. This would be useful for e.g cpu hotplug or XDP (requires per cpu TX queue). But the current vector allocation doesn't support this which will results all virtqueues to share a single vector. We may indeed need more flexible policy here. Thanks
Cornelia Huck
2019-Mar-15 12:41 UTC
virtio-blk: should num_vqs be limited by num_possible_cpus()?
On Fri, 15 Mar 2019 12:50:11 +0800 Jason Wang <jasowang at redhat.com> wrote:> Or something like I proposed several years ago? > https://do-db2.lkml.org/lkml/2014/12/25/169 > > Btw, for virtio-net, I think we actually want to go for having a maximum > number of supported queues like what hardware did. This would be useful > for e.g cpu hotplug or XDP (requires per cpu TX queue). But the current > vector allocation doesn't support this which will results all virtqueues > to share a single vector. We may indeed need more flexible policy here.I think it should be possible for the driver to give the transport hints how to set up their queues/interrupt structures. (The driver probably knows best about its requirements.) Perhaps whether a queue is high or low frequency, or whether it should be low latency, or even whether two queues could share a notification mechanism without drawbacks. It's up to the transport to make use of that information, if possible.
Apparently Analagous Threads
- virtio-blk: should num_vqs be limited by num_possible_cpus()?
- virtio-blk: should num_vqs be limited by num_possible_cpus()?
- virtio-blk: should num_vqs be limited by num_possible_cpus()?
- virtio-blk: should num_vqs be limited by num_possible_cpus()?
- virtio-blk: should num_vqs be limited by num_possible_cpus()?