Tobias Klausmann
2018-Jan-26 13:40 UTC
[Nouveau] [PATCH] drm/nouveau/therm/gp100: Do not report temperature when subdev is shadowed
Not sure if i understand completely what you intend to say here, with this we prevent hwmon from reporting utterly wrong temperature values returning an error (we could return -EBUSY or somehting instead, granted), yet if the device is shadowed, getting a sane temp value out of is seems unlikely to me! Greetings, Tobias On 1/26/18 12:40 PM, Karol Herbst wrote:> no, we can't do that. We actually have to prevent this from hwom. The > issue here is, that the reg read returns 0xffffffff and parsing that > is the first step in the first place. > > On Thu, Jan 25, 2018 at 7:16 PM, Tobias Klausmann > <tobias.johannes.klausmann at mni.thm.de> wrote: >> This fixes wrong temperature outputs e.g. 511°C if the card is asleep. >> >> Signed-off-by: Tobias Klausmann <tobias.johannes.klausmann at mni.thm.de> >> --- >> drivers/gpu/drm/nouveau/nvkm/subdev/therm/gp100.c | 4 +++- >> 1 file changed, 3 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/therm/gp100.c b/drivers/gpu/drm/nouveau/nvkm/subdev/therm/gp100.c >> index 9f0dea3f61dc..45d0ec632b5a 100644 >> --- a/drivers/gpu/drm/nouveau/nvkm/subdev/therm/gp100.c >> +++ b/drivers/gpu/drm/nouveau/nvkm/subdev/therm/gp100.c >> @@ -32,8 +32,10 @@ gp100_temp_get(struct nvkm_therm *therm) >> u32 inttemp = (tsensor & 0x0001fff8); >> >> /* device SHADOWed */ >> - if (tsensor & 0x40000000) >> + if (tsensor & 0x40000000) { >> nvkm_trace(subdev, "reading temperature from SHADOWed sensor\n"); >> + return -ENODEV; >> + } >> >> /* device valid */ >> if (tsensor & 0x20000000) >> -- >> 2.16.1 >> >> _______________________________________________ >> Nouveau mailing list >> Nouveau at lists.freedesktop.org >> https://lists.freedesktop.org/mailman/listinfo/nouveau
Karol Herbst
2018-Jan-26 14:03 UTC
[Nouveau] [PATCH] drm/nouveau/therm/gp100: Do not report temperature when subdev is shadowed
well I just tried to say, that you are not fixing the issue you think were fixing. In your case the GPU is powered off and you get garbage values from any mmio read, so parsing those values is just wrong and we need to prevent doing anything on the hw whenever it is powered off directly in hwmon. On Fri, Jan 26, 2018 at 2:40 PM, Tobias Klausmann <tobias.johannes.klausmann at mni.thm.de> wrote:> Not sure if i understand completely what you intend to say here, with this > we prevent hwmon from reporting utterly wrong temperature values returning > an error (we could return -EBUSY or somehting instead, granted), yet if the > device is shadowed, getting a sane temp value out of is seems unlikely to > me! > > Greetings, > > Tobias > > > On 1/26/18 12:40 PM, Karol Herbst wrote: >> >> no, we can't do that. We actually have to prevent this from hwom. The >> issue here is, that the reg read returns 0xffffffff and parsing that >> is the first step in the first place. >> >> On Thu, Jan 25, 2018 at 7:16 PM, Tobias Klausmann >> <tobias.johannes.klausmann at mni.thm.de> wrote: >>> >>> This fixes wrong temperature outputs e.g. 511°C if the card is asleep. >>> >>> Signed-off-by: Tobias Klausmann <tobias.johannes.klausmann at mni.thm.de> >>> --- >>> drivers/gpu/drm/nouveau/nvkm/subdev/therm/gp100.c | 4 +++- >>> 1 file changed, 3 insertions(+), 1 deletion(-) >>> >>> diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/therm/gp100.c >>> b/drivers/gpu/drm/nouveau/nvkm/subdev/therm/gp100.c >>> index 9f0dea3f61dc..45d0ec632b5a 100644 >>> --- a/drivers/gpu/drm/nouveau/nvkm/subdev/therm/gp100.c >>> +++ b/drivers/gpu/drm/nouveau/nvkm/subdev/therm/gp100.c >>> @@ -32,8 +32,10 @@ gp100_temp_get(struct nvkm_therm *therm) >>> u32 inttemp = (tsensor & 0x0001fff8); >>> >>> /* device SHADOWed */ >>> - if (tsensor & 0x40000000) >>> + if (tsensor & 0x40000000) { >>> nvkm_trace(subdev, "reading temperature from SHADOWed >>> sensor\n"); >>> + return -ENODEV; >>> + } >>> >>> /* device valid */ >>> if (tsensor & 0x20000000) >>> -- >>> 2.16.1 >>> >>> _______________________________________________ >>> Nouveau mailing list >>> Nouveau at lists.freedesktop.org >>> https://lists.freedesktop.org/mailman/listinfo/nouveau
Tobias Klausmann
2018-Jan-26 22:27 UTC
[Nouveau] [PATCH] drm/nouveau/therm/gp100: Do not report temperature when subdev is shadowed
Well fixing the return of wrong values in this function is reasonable by any means, of course not reading the mem in the first place would be nice, but deciding this is imho not in the scope of a temp_get function but somewhere in the code calling temp_get. On 1/26/18 3:03 PM, Karol Herbst wrote:> well I just tried to say, that you are not fixing the issue you think > were fixing. In your case the GPU is powered off and you get garbage > values from any mmio read, so parsing those values is just wrong and > we need to prevent doing anything on the hw whenever it is powered off > directly in hwmon. > > On Fri, Jan 26, 2018 at 2:40 PM, Tobias Klausmann > <tobias.johannes.klausmann at mni.thm.de> wrote: >> Not sure if i understand completely what you intend to say here, with this >> we prevent hwmon from reporting utterly wrong temperature values returning >> an error (we could return -EBUSY or somehting instead, granted), yet if the >> device is shadowed, getting a sane temp value out of is seems unlikely to >> me! >> >> Greetings, >> >> Tobias >> >> >> On 1/26/18 12:40 PM, Karol Herbst wrote: >>> no, we can't do that. We actually have to prevent this from hwom. The >>> issue here is, that the reg read returns 0xffffffff and parsing that >>> is the first step in the first place. >>> >>> On Thu, Jan 25, 2018 at 7:16 PM, Tobias Klausmann >>> <tobias.johannes.klausmann at mni.thm.de> wrote: >>>> This fixes wrong temperature outputs e.g. 511°C if the card is asleep. >>>> >>>> Signed-off-by: Tobias Klausmann <tobias.johannes.klausmann at mni.thm.de> >>>> --- >>>> drivers/gpu/drm/nouveau/nvkm/subdev/therm/gp100.c | 4 +++- >>>> 1 file changed, 3 insertions(+), 1 deletion(-) >>>> >>>> diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/therm/gp100.c >>>> b/drivers/gpu/drm/nouveau/nvkm/subdev/therm/gp100.c >>>> index 9f0dea3f61dc..45d0ec632b5a 100644 >>>> --- a/drivers/gpu/drm/nouveau/nvkm/subdev/therm/gp100.c >>>> +++ b/drivers/gpu/drm/nouveau/nvkm/subdev/therm/gp100.c >>>> @@ -32,8 +32,10 @@ gp100_temp_get(struct nvkm_therm *therm) >>>> u32 inttemp = (tsensor & 0x0001fff8); >>>> >>>> /* device SHADOWed */ >>>> - if (tsensor & 0x40000000) >>>> + if (tsensor & 0x40000000) { >>>> nvkm_trace(subdev, "reading temperature from SHADOWed >>>> sensor\n"); >>>> + return -ENODEV; >>>> + } >>>> >>>> /* device valid */ >>>> if (tsensor & 0x20000000) >>>> -- >>>> 2.16.1 >>>> >>>> _______________________________________________ >>>> Nouveau mailing list >>>> Nouveau at lists.freedesktop.org >>>> https://lists.freedesktop.org/mailman/listinfo/nouveau
Apparently Analagous Threads
- [PATCH] drm/nouveau/therm/gp100: Do not report temperature when subdev is shadowed
- [PATCH] drm/nouveau/therm/gp100: Do not report temperature when subdev is shadowed
- [PATCH] drm/nouveau/therm/gp100: Do not report temperature when subdev is shadowed
- [PATCH v2] drm/nouveau/therm: initial implementation of new gp1xx temperature sensor
- [RFC PATCH] drm/nouveau/therm: initial implementation of new gp1xx temperature sensor