Jason Wang
2022-Mar-25 03:22 UTC
[PATCH 1/2] vdpa: mlx5: prevent cvq work from hogging CPU
On Thu, Mar 24, 2022 at 8:24 PM Eli Cohen <elic at nvidia.com> wrote:> > > > > -----Original Message----- > > From: Hillf Danton <hdanton at sina.com> > > Sent: Thursday, March 24, 2022 2:02 PM > > To: Jason Wang <jasowang at redhat.com> > > Cc: Eli Cohen <elic at nvidia.com>; Michael S. Tsirkin <mst at redhat.com>; virtualization <virtualization at lists.linux-foundation.org>; linux- > > kernel <linux-kernel at vger.kernel.org> > > Subject: Re: [PATCH 1/2] vdpa: mlx5: prevent cvq work from hogging CPU > > > > On Thu, 24 Mar 2022 16:20:34 +0800 Jason Wang wrote: > > > On Thu, Mar 24, 2022 at 2:17 PM Michael S. Tsirkin <mst at redhat.com> wrote: > > > > On Thu, Mar 24, 2022 at 02:04:19PM +0800, Hillf Danton wrote: > > > > > On Thu, 24 Mar 2022 10:34:09 +0800 Jason Wang wrote: > > > > > > On Thu, Mar 24, 2022 at 8:54 AM Hillf Danton <hdanton at sina.com> wrote: > > > > > > > > > > > > > > On Tue, 22 Mar 2022 09:59:14 +0800 Jason Wang wrote: > > > > > > > > > > > > > > > > Yes, there will be no "infinite" loop, but since the loop is triggered > > > > > > > > by userspace. It looks to me it will delay the flush/drain of the > > > > > > > > workqueue forever which is still suboptimal. > > > > > > > > > > > > > > Usually it is barely possible to shoot two birds using a stone. > > > > > > > > > > > > > > Given the "forever", I am inclined to not running faster, hehe, though > > > > > > > another cobble is to add another line in the loop checking if mvdev is > > > > > > > unregistered, and for example make mvdev->cvq unready before destroying > > > > > > > workqueue. > > > > > > > > > > > > > > static void mlx5_vdpa_dev_del(struct vdpa_mgmt_dev *v_mdev, struct vdpa_device *dev) > > > > > > > { > > > > > > > struct mlx5_vdpa_mgmtdev *mgtdev = container_of(v_mdev, struct mlx5_vdpa_mgmtdev, mgtdev); > > > > > > > struct mlx5_vdpa_dev *mvdev = to_mvdev(dev); > > > > > > > struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev); > > > > > > > > > > > > > > mlx5_notifier_unregister(mvdev->mdev, &ndev->nb); > > > > > > > destroy_workqueue(mvdev->wq); > > > > > > > _vdpa_unregister_device(dev); > > > > > > > mgtdev->ndev = NULL; > > > > > > > } > > > > > > > > > > > > > > > > > > > Yes, so we had > > > > > > > > > > > > 1) using a quota for re-requeue > > > > > > 2) using something like > > > > > > > > > > > > while (READ_ONCE(cvq->ready)) { > > > > > > ... > > > > > > cond_resched(); > > > > > > } > > > > > > > > > > > > There should not be too much difference except we need to use > > > > > > cancel_work_sync() instead of flush_work for 1). > > > > > > > > > > > > I would keep the code as is but if you stick I can change. > > > > > > > > > > No Sir I would not - I am simply not a fan of work requeue. > > > > > > > > > > Hillf > > > > > > > > I think I agree - requeue adds latency spikes under heavy load - > > > > unfortunately, not measured by netperf but still important > > > > for latency sensitive workloads. Checking a flag is cheaper. > > > > > > Just spot another possible issue. > > > > > > The workqueue will be used by another work to update the carrier > > > (event_handler()). Using cond_resched() may still have unfair issue > > > which blocks the carrier update for infinite time, > > > > Then would you please specify the reason why mvdev->wq is single > > threaded?I didn't see a reason why it needs to be a single threaded (ordered).> Given requeue, the serialization of the two works is not > > strong. Otherwise unbound WQ that can process works in parallel is > > a cure to the unfairness above.Yes, and we probably don't want a per device workqueue but a per module one. Or simply use the system_wq one.> > > > I think the proposed patch can still be used with quota equal to one. > That would guarantee fairness. > This is not performance critical and a single workqueue should be enough.Yes, but both Hillf and Michael don't like requeuing. So my plan is 1) send patch 2 first since it's a hard requirement for the next RHEL release 2) a series to fix this hogging issue by 2.1) switch to use a per module workqueue 2.2) READ_ONCE(cvq->ready) + cond_resched() Thanks> > > Thanks > > Hillf >
Michael S. Tsirkin
2022-Mar-25 06:45 UTC
[PATCH 1/2] vdpa: mlx5: prevent cvq work from hogging CPU
On Fri, Mar 25, 2022 at 11:22:25AM +0800, Jason Wang wrote:> On Thu, Mar 24, 2022 at 8:24 PM Eli Cohen <elic at nvidia.com> wrote: > > > > > > > > > -----Original Message----- > > > From: Hillf Danton <hdanton at sina.com> > > > Sent: Thursday, March 24, 2022 2:02 PM > > > To: Jason Wang <jasowang at redhat.com> > > > Cc: Eli Cohen <elic at nvidia.com>; Michael S. Tsirkin <mst at redhat.com>; virtualization <virtualization at lists.linux-foundation.org>; linux- > > > kernel <linux-kernel at vger.kernel.org> > > > Subject: Re: [PATCH 1/2] vdpa: mlx5: prevent cvq work from hogging CPU > > > > > > On Thu, 24 Mar 2022 16:20:34 +0800 Jason Wang wrote: > > > > On Thu, Mar 24, 2022 at 2:17 PM Michael S. Tsirkin <mst at redhat.com> wrote: > > > > > On Thu, Mar 24, 2022 at 02:04:19PM +0800, Hillf Danton wrote: > > > > > > On Thu, 24 Mar 2022 10:34:09 +0800 Jason Wang wrote: > > > > > > > On Thu, Mar 24, 2022 at 8:54 AM Hillf Danton <hdanton at sina.com> wrote: > > > > > > > > > > > > > > > > On Tue, 22 Mar 2022 09:59:14 +0800 Jason Wang wrote: > > > > > > > > > > > > > > > > > > Yes, there will be no "infinite" loop, but since the loop is triggered > > > > > > > > > by userspace. It looks to me it will delay the flush/drain of the > > > > > > > > > workqueue forever which is still suboptimal. > > > > > > > > > > > > > > > > Usually it is barely possible to shoot two birds using a stone. > > > > > > > > > > > > > > > > Given the "forever", I am inclined to not running faster, hehe, though > > > > > > > > another cobble is to add another line in the loop checking if mvdev is > > > > > > > > unregistered, and for example make mvdev->cvq unready before destroying > > > > > > > > workqueue. > > > > > > > > > > > > > > > > static void mlx5_vdpa_dev_del(struct vdpa_mgmt_dev *v_mdev, struct vdpa_device *dev) > > > > > > > > { > > > > > > > > struct mlx5_vdpa_mgmtdev *mgtdev = container_of(v_mdev, struct mlx5_vdpa_mgmtdev, mgtdev); > > > > > > > > struct mlx5_vdpa_dev *mvdev = to_mvdev(dev); > > > > > > > > struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev); > > > > > > > > > > > > > > > > mlx5_notifier_unregister(mvdev->mdev, &ndev->nb); > > > > > > > > destroy_workqueue(mvdev->wq); > > > > > > > > _vdpa_unregister_device(dev); > > > > > > > > mgtdev->ndev = NULL; > > > > > > > > } > > > > > > > > > > > > > > > > > > > > > > Yes, so we had > > > > > > > > > > > > > > 1) using a quota for re-requeue > > > > > > > 2) using something like > > > > > > > > > > > > > > while (READ_ONCE(cvq->ready)) { > > > > > > > ... > > > > > > > cond_resched(); > > > > > > > } > > > > > > > > > > > > > > There should not be too much difference except we need to use > > > > > > > cancel_work_sync() instead of flush_work for 1). > > > > > > > > > > > > > > I would keep the code as is but if you stick I can change. > > > > > > > > > > > > No Sir I would not - I am simply not a fan of work requeue. > > > > > > > > > > > > Hillf > > > > > > > > > > I think I agree - requeue adds latency spikes under heavy load - > > > > > unfortunately, not measured by netperf but still important > > > > > for latency sensitive workloads. Checking a flag is cheaper. > > > > > > > > Just spot another possible issue. > > > > > > > > The workqueue will be used by another work to update the carrier > > > > (event_handler()). Using cond_resched() may still have unfair issue > > > > which blocks the carrier update for infinite time, > > > > > > Then would you please specify the reason why mvdev->wq is single > > > threaded? > > I didn't see a reason why it needs to be a single threaded (ordered). > > > Given requeue, the serialization of the two works is not > > > strong. Otherwise unbound WQ that can process works in parallel is > > > a cure to the unfairness above. > > Yes, and we probably don't want a per device workqueue but a per > module one. Or simply use the system_wq one. > > > > > > > > I think the proposed patch can still be used with quota equal to one. > > That would guarantee fairness. > > This is not performance critical and a single workqueue should be enough. > > Yes, but both Hillf and Michael don't like requeuing. So my plan is > > 1) send patch 2 first since it's a hard requirement for the next RHEL release > 2) a series to fix this hogging issue by > 2.1) switch to use a per module workqueue > 2.2) READ_ONCE(cvq->ready) + cond_resched() > > ThanksActually if we don't care about speed here then requeing with quota of 1 is fine, in that we don't have a quota at all, we just always requeue instead of a loop. It's the mix of requeue and a loop that I consider confusing.> > > > > Thanks > > > Hillf > >