thr3ads.net - Virtualization - virtio-fs: adding support for multi-queue [Feb 2023]

If this information is useful, please help other people find it:
Share via:

Vivek Goyal

2023-Feb-07 21:57 UTC

virtio-fs: adding support for multi-queue

On Tue, Feb 07, 2023 at 04:32:02PM -0500, Stefan Hajnoczi
wrote:> On Tue, Feb 07, 2023 at 02:53:58PM -0500, Vivek Goyal wrote:
> > On Tue, Feb 07, 2023 at 02:45:39PM -0500, Stefan Hajnoczi wrote:
> > > On Tue, Feb 07, 2023 at 11:14:46AM +0100, Peter-Jan Gootzen
wrote:
> > > > Hi,
> > > > 
> > 
> > [cc German]
> > 
> > > > For my MSc thesis project in collaboration with IBM
> > > > (https://github.com/IBM/dpu-virtio-fs) we are looking to
improve the
> > > > performance of the virtio-fs driver in high throughput
scenarios. We think
> > > > the main bottleneck is the fact that the virtio-fs driver
does not support
> > > > multi-queue (while the spec does). A big factor in this is
that our setup on
> > > > the virtio-fs device-side (a DPU) does not easily allow
multiple cores to
> > > > tend to a single virtio queue.
> > 
> > This is an interesting limitation in DPU.
> 
> Virtqueues are single-consumer queues anyway. Sharing them between
> multiple threads would be expensive. I think using multiqueue is natural
> and not specific to DPUs.
Can we create multiple threads (a thread pool) on DPU and let these
threads process requests in parallel (While there is only one virt
queue).

So this is what we had done in virtiofsd. One thread is dedicated to
pull the requests from virt queue and then pass the request to thread
pool to process it. And that seems to help with performance in
certain cases.

Is that possible on DPU? That itself can give a nice performance
boost for certain workloads without having to implement multiqueue
actually.

Just curious. I am not opposed to the idea of multiqueue. I am
just curious about the kind of performance gain (if any) it can
provide. And will this be helpful for rust virtiofsd running on
host as well?

Thanks
Vivek

Peter-Jan Gootzen

2023-Feb-08 08:33 UTC

head link

virtio-fs: adding support for multi-queue

On 07/02/2023 22:57, Vivek Goyal wrote:> On Tue, Feb 07, 2023 at 04:32:02PM -0500, Stefan Hajnoczi wrote:
>> On Tue, Feb 07, 2023 at 02:53:58PM -0500, Vivek Goyal wrote:
>>> On Tue, Feb 07, 2023 at 02:45:39PM -0500, Stefan Hajnoczi wrote:
>>>> On Tue, Feb 07, 2023 at 11:14:46AM +0100, Peter-Jan Gootzen
wrote:
>>>>> Hi,
>>>>>
>>>
>>> [cc German]
>>>
>>>>> For my MSc thesis project in collaboration with IBM
>>>>> (https://github.com/IBM/dpu-virtio-fs) we are looking to
improve the
>>>>> performance of the virtio-fs driver in high throughput
scenarios. We think
>>>>> the main bottleneck is the fact that the virtio-fs driver
does not support
>>>>> multi-queue (while the spec does). A big factor in this is
that our setup on
>>>>> the virtio-fs device-side (a DPU) does not easily allow
multiple cores to
>>>>> tend to a single virtio queue.
>>>
>>> This is an interesting limitation in DPU.
>>
>> Virtqueues are single-consumer queues anyway. Sharing them between
>> multiple threads would be expensive. I think using multiqueue is
natural
>> and not specific to DPUs.
> 
> Can we create multiple threads (a thread pool) on DPU and let these
> threads process requests in parallel (While there is only one virt
> queue).
> 
> So this is what we had done in virtiofsd. One thread is dedicated to
> pull the requests from virt queue and then pass the request to thread
> pool to process it. And that seems to help with performance in
> certain cases.
> 
> Is that possible on DPU? That itself can give a nice performance
> boost for certain workloads without having to implement multiqueue
> actually.
> 
> Just curious. I am not opposed to the idea of multiqueue. I am
> just curious about the kind of performance gain (if any) it can
> provide. And will this be helpful for rust virtiofsd running on
> host as well?
> 
> Thanks
> Vivek
>There is technically nothing preventing us from consuming a single queue 
on multiple cores, however our current Virtio implementation (DPU-side) 
is set up with the assumption that you should never want to do that 
(concurrency mayham around the Virtqueues and the DMAs). So instead of 
putting all the work into reworking the implementation to support that 
and still incur the big overhead, we see it more fitting to amend the 
virtio-fs driver with multi-queue support.


 > Is it just a theory at this point of time or have you implemented
 > it and seeing significant performance benefit with multiqueue?

It is a theory, but we are currently seeing that using the single 
request queue, the single core attending to that queue on the DPU is 
reasonably close to being fully saturated.

 > And will this be helpful for rust virtiofsd running on
 > host as well?

I figure this would be dependent on the workload and the users-needs.
Having many cores concurrently pulling on their own virtq and then 
immediately process the request locally would of course improve 
performance. But we are offloading all this work to the DPU, for 
providing high-throughput cloud services.

 > Sounds good. Assigning vqs round-robin is the strategy that virtio-net
 > and virtio-blk use. virtio-blk could be an interesting example as it's
 > similar to virtiofs. The Linux multiqueue block layer and core virtio
 > irq allocation handle CPU affinity in the case of virtio-blk.

The virtio-blk use the queue assigned by the mq block layer and 
virtio-net the queue assigned from the net core layer correct?

If I interpret you correct, the round-robin strategy is done by 
assigning cores to queues round-robin, not per requests dynamically 
round-robin?
This is what I remembered as well, but can't find it clearly in the 
source right now, do you have references to the source for this?

 > Which DPU are you targetting?

This is something I unfortunately can't disclose at the moment.

Thanks,
Peter-Jan

Virtualization - Feb 2023 - virtio-fs: adding support for multi-queue

virtio-fs: adding support for multi-queue

virtio-fs: adding support for multi-queue