Xuan Zhuo
2023-Apr-27 08:51 UTC
[PATCH] virtio_net: suppress cpu stall when free_unused_bufs
On Thu, 27 Apr 2023 16:49:58 +0800, Wenliang Wang <wangwenliang.1995 at bytedance.com> wrote:> On 4/27/23 4:23 PM, Michael S. Tsirkin wrote: > > On Thu, Apr 27, 2023 at 04:13:45PM +0800, Xuan Zhuo wrote: > >> On Thu, 27 Apr 2023 04:12:44 -0400, "Michael S. Tsirkin" <mst at redhat.com> wrote: > >>> On Thu, Apr 27, 2023 at 03:13:44PM +0800, Xuan Zhuo wrote: > >>>> On Thu, 27 Apr 2023 15:02:26 +0800, Wenliang Wang <wangwenliang.1995 at bytedance.com> wrote: > >>>>> > >>>>> > >>>>> On 4/27/23 2:20 PM, Xuan Zhuo wrote: > >>>>>> On Thu, 27 Apr 2023 12:34:33 +0800, Wenliang Wang <wangwenliang.1995 at bytedance.com> wrote: > >>>>>>> For multi-queue and large rx-ring-size use case, the following error > >>>>>> > >>>>>> Cound you give we one number for example? > >>>>> > >>>>> 128 queues and 16K queue_size is typical. > >>>>> > >>>>>> > >>>>>>> occurred when free_unused_bufs: > >>>>>>> rcu: INFO: rcu_sched self-detected stall on CPU. > >>>>>>> > >>>>>>> Signed-off-by: Wenliang Wang <wangwenliang.1995 at bytedance.com> > >>>>>>> --- > >>>>>>> drivers/net/virtio_net.c | 1 + > >>>>>>> 1 file changed, 1 insertion(+) > >>>>>>> > >>>>>>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > >>>>>>> index ea1bd4bb326d..21d8382fd2c7 100644 > >>>>>>> --- a/drivers/net/virtio_net.c > >>>>>>> +++ b/drivers/net/virtio_net.c > >>>>>>> @@ -3565,6 +3565,7 @@ static void free_unused_bufs(struct virtnet_info *vi) > >>>>>>> struct virtqueue *vq = vi->rq[i].vq; > >>>>>>> while ((buf = virtqueue_detach_unused_buf(vq)) != NULL) > >>>>>>> virtnet_rq_free_unused_buf(vq, buf); > >>>>>>> + schedule(); > >>>>>> > >>>>>> Just for rq? > >>>>>> > >>>>>> Do we need to do the same thing for sq? > >>>>> Rq buffers are pre-allocated, take seconds to free rq unused buffers. > >>>>> > >>>>> Sq unused buffers are much less, so do the same for sq is optional. > >>>> > >>>> I got. > >>>> > >>>> I think we should look for a way, compatible with the less queues or the smaller > >>>> rings. Calling schedule() directly may be not a good way. > >>>> > >>>> Thanks. > >>> > >>> Why isn't it a good way? > >> > >> For the small ring, I don't think it is a good way, maybe we only deal with one > >> buf, then call schedule(). > >> > >> We can call the schedule() after processing a certain number of buffers, > >> or check need_resched () first. > >> > >> Thanks. > > > > > > Wenliang, does > > if (need_resched()) > > schedule(); > > fix the issue for you? > > > Yeah, it works better.I prefer to use it in combination with a fixed number(such as 256). Every time 256 buffers are processed, check need_resched(). This can accommodate large rings and small rings. Also, it is necessary to add similar logic to sq. Although the possibility is low, it is possible that the same problem will occur. Thanks.> > > >> > >> > >>> > >>>> > >>>>> > >>>>>> > >>>>>> Thanks. > >>>>>> > >>>>>> > >>>>>>> } > >>>>>>> } > >>>>>>> > >>>>>>> -- > >>>>>>> 2.20.1 > >>>>>>> > >>> > >