thr3ads.net - Virtualization - [PATCH 1/2] vdpa: mlx5: prevent cvq work from hogging CPU [Mar 2022]

If this information is useful, please help other people find it:
Share via:

Jason Wang

2022-Mar-25 03:22 UTC

[PATCH 1/2] vdpa: mlx5: prevent cvq work from hogging CPU

On Thu, Mar 24, 2022 at 8:24 PM Eli Cohen <elic at nvidia.com>
wrote:>
>
>
> > -----Original Message-----
> > From: Hillf Danton <hdanton at sina.com>
> > Sent: Thursday, March 24, 2022 2:02 PM
> > To: Jason Wang <jasowang at redhat.com>
> > Cc: Eli Cohen <elic at nvidia.com>; Michael S. Tsirkin <mst
at redhat.com>; virtualization <virtualization at
lists.linux-foundation.org>; linux-
> > kernel <linux-kernel at vger.kernel.org>
> > Subject: Re: [PATCH 1/2] vdpa: mlx5: prevent cvq work from hogging CPU
> >
> > On Thu, 24 Mar 2022 16:20:34 +0800 Jason Wang wrote:
> > > On Thu, Mar 24, 2022 at 2:17 PM Michael S. Tsirkin <mst at
redhat.com> wrote:
> > > > On Thu, Mar 24, 2022 at 02:04:19PM +0800, Hillf Danton
wrote:
> > > > > On Thu, 24 Mar 2022 10:34:09 +0800 Jason Wang wrote:
> > > > > > On Thu, Mar 24, 2022 at 8:54 AM Hillf Danton
<hdanton at sina.com> wrote:
> > > > > > >
> > > > > > > On Tue, 22 Mar 2022 09:59:14 +0800 Jason Wang
wrote:
> > > > > > > >
> > > > > > > > Yes, there will be no
"infinite" loop, but since the loop is triggered
> > > > > > > > by userspace. It looks to me it will
delay the flush/drain of the
> > > > > > > > workqueue forever which is still
suboptimal.
> > > > > > >
> > > > > > > Usually it is barely possible to shoot two
birds using a stone.
> > > > > > >
> > > > > > > Given the "forever", I am inclined
to not running faster, hehe, though
> > > > > > > another cobble is to add another line in the
loop checking if mvdev is
> > > > > > > unregistered, and for example make
mvdev->cvq unready before destroying
> > > > > > > workqueue.
> > > > > > >
> > > > > > > static void mlx5_vdpa_dev_del(struct
vdpa_mgmt_dev *v_mdev, struct vdpa_device *dev)
> > > > > > > {
> > > > > > >         struct mlx5_vdpa_mgmtdev *mgtdev =
container_of(v_mdev, struct mlx5_vdpa_mgmtdev, mgtdev);
> > > > > > >         struct mlx5_vdpa_dev *mvdev =
to_mvdev(dev);
> > > > > > >         struct mlx5_vdpa_net *ndev =
to_mlx5_vdpa_ndev(mvdev);
> > > > > > >
> > > > > > >        
mlx5_notifier_unregister(mvdev->mdev, &ndev->nb);
> > > > > > >         destroy_workqueue(mvdev->wq);
> > > > > > >         _vdpa_unregister_device(dev);
> > > > > > >         mgtdev->ndev = NULL;
> > > > > > > }
> > > > > > >
> > > > > >
> > > > > > Yes, so we had
> > > > > >
> > > > > > 1) using a quota for re-requeue
> > > > > > 2) using something like
> > > > > >
> > > > > > while (READ_ONCE(cvq->ready)) {
> > > > > >         ...
> > > > > >         cond_resched();
> > > > > > }
> > > > > >
> > > > > > There should not be too much difference except we
need to use
> > > > > > cancel_work_sync() instead of flush_work for 1).
> > > > > >
> > > > > > I would keep the code as is but if you stick I can
change.
> > > > >
> > > > > No Sir I would not - I am simply not a fan of work
requeue.
> > > > >
> > > > > Hillf
> > > >
> > > > I think I agree - requeue adds latency spikes under heavy
load -
> > > > unfortunately, not measured by netperf but still important
> > > > for latency sensitive workloads. Checking a flag is cheaper.
> > >
> > > Just spot another possible issue.
> > >
> > > The workqueue will be used by another work to update the carrier
> > > (event_handler()). Using cond_resched() may still have unfair
issue
> > > which blocks the carrier update for infinite time,
> >
> > Then would you please specify the reason why mvdev->wq is single
> > threaded?
I didn't see a reason why it needs to be a single threaded (ordered).
> Given requeue, the serialization of the two works is not
> > strong. Otherwise unbound WQ that can process works in parallel is
> > a cure to the unfairness above.
Yes, and we probably don't want a per device workqueue but a per
module one. Or simply use the system_wq one.
> >
>
> I think the proposed patch can still be used with quota equal to one.
> That would guarantee fairness.
> This is not performance critical and a single workqueue should be enough.
Yes, but both Hillf and Michael don't like requeuing. So my plan is

1) send patch 2 first since it's a hard requirement for the next RHEL
release
2) a series to fix this hogging issue by
2.1) switch to use a per module workqueue
2.2) READ_ONCE(cvq->ready) + cond_resched()

Thanks
>
> > Thanks
> > Hillf
>

Michael S. Tsirkin

2022-Mar-25 06:45 UTC

head link

[PATCH 1/2] vdpa: mlx5: prevent cvq work from hogging CPU

On Fri, Mar 25, 2022 at 11:22:25AM +0800, Jason Wang
wrote:> On Thu, Mar 24, 2022 at 8:24 PM Eli Cohen <elic at nvidia.com> wrote:
> >
> >
> >
> > > -----Original Message-----
> > > From: Hillf Danton <hdanton at sina.com>
> > > Sent: Thursday, March 24, 2022 2:02 PM
> > > To: Jason Wang <jasowang at redhat.com>
> > > Cc: Eli Cohen <elic at nvidia.com>; Michael S. Tsirkin
<mst at redhat.com>; virtualization <virtualization at
lists.linux-foundation.org>; linux-
> > > kernel <linux-kernel at vger.kernel.org>
> > > Subject: Re: [PATCH 1/2] vdpa: mlx5: prevent cvq work from
hogging CPU
> > >
> > > On Thu, 24 Mar 2022 16:20:34 +0800 Jason Wang wrote:
> > > > On Thu, Mar 24, 2022 at 2:17 PM Michael S. Tsirkin <mst
at redhat.com> wrote:
> > > > > On Thu, Mar 24, 2022 at 02:04:19PM +0800, Hillf Danton
wrote:
> > > > > > On Thu, 24 Mar 2022 10:34:09 +0800 Jason Wang
wrote:
> > > > > > > On Thu, Mar 24, 2022 at 8:54 AM Hillf Danton
<hdanton at sina.com> wrote:
> > > > > > > >
> > > > > > > > On Tue, 22 Mar 2022 09:59:14 +0800 Jason
Wang wrote:
> > > > > > > > >
> > > > > > > > > Yes, there will be no
"infinite" loop, but since the loop is triggered
> > > > > > > > > by userspace. It looks to me it
will delay the flush/drain of the
> > > > > > > > > workqueue forever which is still
suboptimal.
> > > > > > > >
> > > > > > > > Usually it is barely possible to shoot
two birds using a stone.
> > > > > > > >
> > > > > > > > Given the "forever", I am
inclined to not running faster, hehe, though
> > > > > > > > another cobble is to add another line in
the loop checking if mvdev is
> > > > > > > > unregistered, and for example make
mvdev->cvq unready before destroying
> > > > > > > > workqueue.
> > > > > > > >
> > > > > > > > static void mlx5_vdpa_dev_del(struct
vdpa_mgmt_dev *v_mdev, struct vdpa_device *dev)
> > > > > > > > {
> > > > > > > >         struct mlx5_vdpa_mgmtdev *mgtdev
= container_of(v_mdev, struct mlx5_vdpa_mgmtdev, mgtdev);
> > > > > > > >         struct mlx5_vdpa_dev *mvdev =
to_mvdev(dev);
> > > > > > > >         struct mlx5_vdpa_net *ndev =
to_mlx5_vdpa_ndev(mvdev);
> > > > > > > >
> > > > > > > >        
mlx5_notifier_unregister(mvdev->mdev, &ndev->nb);
> > > > > > > >         destroy_workqueue(mvdev->wq);
> > > > > > > >         _vdpa_unregister_device(dev);
> > > > > > > >         mgtdev->ndev = NULL;
> > > > > > > > }
> > > > > > > >
> > > > > > >
> > > > > > > Yes, so we had
> > > > > > >
> > > > > > > 1) using a quota for re-requeue
> > > > > > > 2) using something like
> > > > > > >
> > > > > > > while (READ_ONCE(cvq->ready)) {
> > > > > > >         ...
> > > > > > >         cond_resched();
> > > > > > > }
> > > > > > >
> > > > > > > There should not be too much difference
except we need to use
> > > > > > > cancel_work_sync() instead of flush_work for
1).
> > > > > > >
> > > > > > > I would keep the code as is but if you stick
I can change.
> > > > > >
> > > > > > No Sir I would not - I am simply not a fan of work
requeue.
> > > > > >
> > > > > > Hillf
> > > > >
> > > > > I think I agree - requeue adds latency spikes under
heavy load -
> > > > > unfortunately, not measured by netperf but still
important
> > > > > for latency sensitive workloads. Checking a flag is
cheaper.
> > > >
> > > > Just spot another possible issue.
> > > >
> > > > The workqueue will be used by another work to update the
carrier
> > > > (event_handler()). Using cond_resched() may still have
unfair issue
> > > > which blocks the carrier update for infinite time,
> > >
> > > Then would you please specify the reason why mvdev->wq is
single
> > > threaded?
> 
> I didn't see a reason why it needs to be a single threaded (ordered).
> 
> > Given requeue, the serialization of the two works is not
> > > strong. Otherwise unbound WQ that can process works in parallel
is
> > > a cure to the unfairness above.
> 
> Yes, and we probably don't want a per device workqueue but a per
> module one. Or simply use the system_wq one.
> 
> > >
> >
> > I think the proposed patch can still be used with quota equal to one.
> > That would guarantee fairness.
> > This is not performance critical and a single workqueue should be
enough.
> 
> Yes, but both Hillf and Michael don't like requeuing. So my plan is
> 
> 1) send patch 2 first since it's a hard requirement for the next RHEL
release
> 2) a series to fix this hogging issue by
> 2.1) switch to use a per module workqueue
> 2.2) READ_ONCE(cvq->ready) + cond_resched()
> 
> Thanks
Actually if we don't care about speed here then requeing with quota of 1
is fine, in that we don't have a quota at all, we just always requeue
instead of a loop.

It's the mix of requeue and a loop that I consider confusing.

> >
> > > Thanks
> > > Hillf
> >

Virtualization - Mar 2022 - [PATCH 1/2] vdpa: mlx5: prevent cvq work from hogging CPU

[PATCH 1/2] vdpa: mlx5: prevent cvq work from hogging CPU

[PATCH 1/2] vdpa: mlx5: prevent cvq work from hogging CPU