thr3ads.net - Nouveau - [Nouveau] [PATCH v2] pmu: fix queued messages while getting no IRQ [Nov 2015]

If this information is useful, please help other people find it:
Share via:

Karol Herbst

2015-Nov-14 18:44 UTC

[Nouveau] [PATCH v2] pmu: fix queued messages while getting no IRQ

I encountered while stresstesting the reclocking code, that rarely (1 out of
20.000+ requests) we don't get any IRQ in nvkm_pmu_intr.

This means we have a queued message on the pmu, but nouveau doesn't read it
and
waits infinitely in nvkm_pmu_send:
	if (reply) {
		wait_event(pmu->recv.wait, (pmu->recv.process == 0));

therefore let us use wait_event_timeout with a 1s timeout frame and just check
whether there is a message queued and handle it if there is one.

Return -ETIMEDOUT whenever we timed out and there is no message queued or when
we hit another timeout while trying to read the message without getting any IRQ

The benefit of not using wait_event is, that we don't have a kworker waiting
on an event, which makes it easier to reload the module at runtime, which helps
me developing on nouveau on my laptop a lot, because I don't need to reboot
anymore

Nethertheless, we shouldn't use wait_event here, because we can't
guarantee any
answere at all, can we?

v2: moved it into a new function

Signed-off-by: Karol Herbst <nouveau at karolherbst.de>
---
 drm/nouveau/nvkm/subdev/pmu/base.c | 43 ++++++++++++++++++++++++++++++++------
 1 file changed, 37 insertions(+), 6 deletions(-)

diff --git a/drm/nouveau/nvkm/subdev/pmu/base.c
b/drm/nouveau/nvkm/subdev/pmu/base.c
index 6b2007f..fafbe2a 100644
--- a/drm/nouveau/nvkm/subdev/pmu/base.c
+++ b/drm/nouveau/nvkm/subdev/pmu/base.c
@@ -43,6 +43,41 @@ nvkm_pmu_handle_reclk_request(struct work_struct *work)
 	nvkm_clk_pmu_reclk_request(clk, pmu->intr.data[0]);
 }
 
+static int
+wait_for_pmu_reply(struct nvkm_pmu *pmu, u32 reply[2])
+{
+	struct nvkm_subdev *subdev = &pmu->subdev;
+	struct nvkm_device *device = subdev->device;
+	unsigned long jiffies = msecs_to_jiffies(1000);
+
+	if (!wait_event_timeout(pmu->recv.wait, (pmu->recv.process == 0),
jiffies)) {
+		u32 addr = nvkm_rd32(device, 0x10a4cc);
+		nvkm_error(subdev, "wait on reply timed out\n");
+
+		if (addr != nvkm_rd32(device, 0x10a4c8)) {
+			nvkm_error(subdev, "found queued message without getting an
interrupt\n");
+			schedule_work(&pmu->recv.work);
+
+			if (!wait_event_timeout(pmu->recv.wait, (pmu->recv.process == 0),
jiffies)) {
+				nvkm_error(subdev, "failed to repair PMU state\n");
+				goto reply_error;
+			}
+		} else
+			goto reply_error;
+	}
+
+	reply[0] = pmu->recv.data[0];
+	reply[1] = pmu->recv.data[1];
+	mutex_unlock(&subdev->mutex);
+	return 0;
+
+reply_error:
+	reply[0] = 0;
+	reply[1] = 0;
+	mutex_unlock(&subdev->mutex);
+	return -ETIMEDOUT;
+}
+
 int
 nvkm_pmu_send(struct nvkm_pmu *pmu, u32 reply[2],
 	      u32 process, u32 message, u32 data0, u32 data1)
@@ -88,12 +123,8 @@ nvkm_pmu_send(struct nvkm_pmu *pmu, u32 reply[2],
 	nvkm_wr32(device, 0x10a580, 0x00000000);
 
 	/* wait for reply, if requested */
-	if (reply) {
-		wait_event(pmu->recv.wait, (pmu->recv.process == 0));
-		reply[0] = pmu->recv.data[0];
-		reply[1] = pmu->recv.data[1];
-		mutex_unlock(&subdev->mutex);
-	}
+	if (reply)
+		return wait_for_pmu_reply(pmu, reply);
 
 	return 0;
 }
-- 
2.6.3

Ilia Mirkin

2015-Nov-14 18:50 UTC

head link

[Nouveau] [PATCH v2] pmu: fix queued messages while getting no IRQ

On Sat, Nov 14, 2015 at 1:44 PM, Karol Herbst <nouveau at karolherbst.de>
wrote:> I encountered while stresstesting the reclocking code, that rarely (1 out
of
> 20.000+ requests) we don't get any IRQ in nvkm_pmu_intr.
>
> This means we have a queued message on the pmu, but nouveau doesn't
read it and
> waits infinitely in nvkm_pmu_send:
>         if (reply) {
>                 wait_event(pmu->recv.wait, (pmu->recv.process == 0));
>
> therefore let us use wait_event_timeout with a 1s timeout frame and just
check
> whether there is a message queued and handle it if there is one.
>
> Return -ETIMEDOUT whenever we timed out and there is no message queued or
when
> we hit another timeout while trying to read the message without getting any
IRQ
>
> The benefit of not using wait_event is, that we don't have a kworker
waiting
> on an event, which makes it easier to reload the module at runtime, which
helps
> me developing on nouveau on my laptop a lot, because I don't need to
reboot
> anymore
>
> Nethertheless, we shouldn't use wait_event here, because we can't
guarantee any
> answere at all, can we?
>
> v2: moved it into a new function
>
> Signed-off-by: Karol Herbst <nouveau at karolherbst.de>
> ---
>  drm/nouveau/nvkm/subdev/pmu/base.c | 43
++++++++++++++++++++++++++++++++------
>  1 file changed, 37 insertions(+), 6 deletions(-)
>
> diff --git a/drm/nouveau/nvkm/subdev/pmu/base.c
b/drm/nouveau/nvkm/subdev/pmu/base.c
> index 6b2007f..fafbe2a 100644
> --- a/drm/nouveau/nvkm/subdev/pmu/base.c
> +++ b/drm/nouveau/nvkm/subdev/pmu/base.c
> @@ -43,6 +43,41 @@ nvkm_pmu_handle_reclk_request(struct work_struct *work)
>         nvkm_clk_pmu_reclk_request(clk, pmu->intr.data[0]);
>  }
>
> +static int
> +wait_for_pmu_reply(struct nvkm_pmu *pmu, u32 reply[2])
> +{
> +       struct nvkm_subdev *subdev = &pmu->subdev;
> +       struct nvkm_device *device = subdev->device;
> +       unsigned long jiffies = msecs_to_jiffies(1000);
> +
> +       if (!wait_event_timeout(pmu->recv.wait, (pmu->recv.process ==
0), jiffies)) {
> +               u32 addr = nvkm_rd32(device, 0x10a4cc);
> +               nvkm_error(subdev, "wait on reply timed out\n");
> +
> +               if (addr != nvkm_rd32(device, 0x10a4c8)) {
> +                       nvkm_error(subdev, "found queued message
without getting an interrupt\n");
> +                       schedule_work(&pmu->recv.work);
> +
> +                       if (!wait_event_timeout(pmu->recv.wait,
(pmu->recv.process == 0), jiffies)) {
> +                               nvkm_error(subdev, "failed to repair
PMU state\n");
> +                               goto reply_error;
> +                       }
> +               } else
Not sure whether kernel style dictates this, but I really hate these
"hanging" else's... both sides should have brackets if either one
does.
> +                       goto reply_error;
> +       }
> +
> +       reply[0] = pmu->recv.data[0];
> +       reply[1] = pmu->recv.data[1];
> +       mutex_unlock(&subdev->mutex);
> +       return 0;
> +
> +reply_error:
> +       reply[0] = 0;
> +       reply[1] = 0;
> +       mutex_unlock(&subdev->mutex);
> +       return -ETIMEDOUT;
> +}
> +
>  int
>  nvkm_pmu_send(struct nvkm_pmu *pmu, u32 reply[2],
>               u32 process, u32 message, u32 data0, u32 data1)
> @@ -88,12 +123,8 @@ nvkm_pmu_send(struct nvkm_pmu *pmu, u32 reply[2],
>         nvkm_wr32(device, 0x10a580, 0x00000000);
>
>         /* wait for reply, if requested */
> -       if (reply) {
> -               wait_event(pmu->recv.wait, (pmu->recv.process == 0));
> -               reply[0] = pmu->recv.data[0];
> -               reply[1] = pmu->recv.data[1];
> -               mutex_unlock(&subdev->mutex);
> -       }
> +       if (reply)
> +               return wait_for_pmu_reply(pmu, reply);
Having one function lock and another unlock is a disaster waiting to
happen. Perhaps make wiat_for_pmu_reply not handle the unlock and
instead do

int ret = 0;
if (reply)
  ret = wait_for_pmu_reply()

return ret;

Additionally leaving the reply[] filling in this function would allow
you to avoid annoying error handling and goto's in the other function.
>
>         return 0;
>  }
> --
> 2.6.3
>
> _______________________________________________
> Nouveau mailing list
> Nouveau at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/nouveau

Apparently Analagous Threads

Search for more apparently analagous threads

Nouveau - Nov 2015 - [PATCH v2] pmu: fix queued messages while getting no IRQ

[Nouveau] [PATCH v2] pmu: fix queued messages while getting no IRQ

[Nouveau] [PATCH v2] pmu: fix queued messages while getting no IRQ

Apparently Analagous Threads