Martin Peres
2017-Nov-23 22:48 UTC
[Nouveau] Addressing the problem of noisy GPUs under Nouveau
On 23/11/17 10:06, John Hubbard wrote:> On 11/22/2017 05:07 PM, Martin Peres wrote: >> Hey, >> >> Thanks for your answer, Andy! >> >> On 22/11/17 04:06, Ilia Mirkin wrote: >>> On Tue, Nov 21, 2017 at 8:29 PM, Andy Ritger <aritger at nvidia.com> wrote: >>> Martin's question was very long, but it boils down to this: >>> >>> How do we compute the correct values to write into the e114/e118 pwm >>> registers based on the VBIOS contents and current state of the board >>> (like temperature). >> >> Unfortunately, it can also be the e11c/e120 couple, or 0x200d8/dc on >> GF119+, or 0x200cd/d0 on Kepler+. >> >> At least, it looks like we know which PWM controler we need to drive, so >> I did not want to muddy the water even more by giving register >> addresses, rather concentrating on the problem at hand: How to compute >> the duty value for the PWM controler. >> >>> >>> We generally do this right, but appear to get it extra-wrong for certain GPUs. >> >> Yes... So far, we are always safe, but users tend to mind when their >> computer sound like a jumbo jet at take off... Who would have thought? :D >> >> Anyway, looking forward to your answer! >> >> Cheers, >> Martin > > > Hi Martin, > > One of our firmware engineers thinks that this looks a lot like PWM inversion. > For some SKUs, the interpretation of the PWM duty cycle is inverted. That > would probably make it *very* difficult to find a sensible algorithm that > covered all the SKUs, given that some are inverted and others are not. > > For the noisy GPUs, a very useful experiment would be to try inverting it, > like this: > > pwmDutyCycle = pwmPeriod - pwmDutyCycle; > > ...and then see if fan control starts behaving closer to how you've actually > programmed it. > > Would that be easy enough to try out? It should help narrow down the > problem at least. >Hey John, Unfortunately, we know about PWM inversion, and one can know which mode to use based on the GPIO entry associated to the fan (inverted). We have had support for this in Nouveau for a long time. At the very least, this is not the problem on my GF108. I am certain that the problem I am seeing is related to this vbios table I wrote about (BIT P, offset 0x18). It is used to compute what PWM duty I should use for both 0 and 100% of the fan speed. Computing the value for 0% fan speed is difficult because of non-continuous nature of some of the functions[1], but I can always over-approximate. However, I failed to accurately compute the duty I need to write to get the 100% fan speed (I have cases where I greatly over-estimate it...). Could you please check out the vbios table I am pointing at? I am quite sure that your documentation will be clearer than my babbling :D Thanks, Martin [1] http://fs.mupuf.org/nvidia/fan_calib/pwm_offset.png
John Hubbard
2017-Nov-28 05:32 UTC
[Nouveau] Addressing the problem of noisy GPUs under Nouveau
On 11/23/2017 02:48 PM, Martin Peres wrote:> On 23/11/17 10:06, John Hubbard wrote: >> On 11/22/2017 05:07 PM, Martin Peres wrote: >>> Hey, >>> >>> Thanks for your answer, Andy! >>> >>> On 22/11/17 04:06, Ilia Mirkin wrote: >>>> On Tue, Nov 21, 2017 at 8:29 PM, Andy Ritger <aritger at nvidia.com> wrote: >>>> Martin's question was very long, but it boils down to this: >>>> >>>> How do we compute the correct values to write into the e114/e118 pwm >>>> registers based on the VBIOS contents and current state of the board >>>> (like temperature). >>> >>> Unfortunately, it can also be the e11c/e120 couple, or 0x200d8/dc on >>> GF119+, or 0x200cd/d0 on Kepler+. >>> >>> At least, it looks like we know which PWM controler we need to drive, so >>> I did not want to muddy the water even more by giving register >>> addresses, rather concentrating on the problem at hand: How to compute >>> the duty value for the PWM controler. >>> >>>> >>>> We generally do this right, but appear to get it extra-wrong for certain GPUs. >>> >>> Yes... So far, we are always safe, but users tend to mind when their >>> computer sound like a jumbo jet at take off... Who would have thought? :D >>> >>> Anyway, looking forward to your answer! >>> >>> Cheers, >>> Martin >> >> >> Hi Martin, >> >> One of our firmware engineers thinks that this looks a lot like PWM inversion. >> For some SKUs, the interpretation of the PWM duty cycle is inverted. That >> would probably make it *very* difficult to find a sensible algorithm that >> covered all the SKUs, given that some are inverted and others are not. >> >> For the noisy GPUs, a very useful experiment would be to try inverting it, >> like this: >> >> pwmDutyCycle = pwmPeriod - pwmDutyCycle; >> >> ...and then see if fan control starts behaving closer to how you've actually >> programmed it. >> >> Would that be easy enough to try out? It should help narrow down the >> problem at least. >> > > Hey John, > > Unfortunately, we know about PWM inversion, and one can know which mode > to use based on the GPIO entry associated to the fan (inverted). We have > had support for this in Nouveau for a long time. At the very least, this > is not the problem on my GF108. > > I am certain that the problem I am seeing is related to this vbios table > I wrote about (BIT P, offset 0x18). It is used to compute what PWM duty > I should use for both 0 and 100% of the fan speed. > > Computing the value for 0% fan speed is difficult because of > non-continuous nature of some of the functions[1], but I can always > over-approximate. However, I failed to accurately compute the duty I > need to write to get the 100% fan speed (I have cases where I greatly > over-estimate it...). > > Could you please check out the vbios table I am pointing at? I am quite > sure that your documentation will be clearer than my babbling :DYes. We will check on this. There has been some productive discussion internally, but it will take some more investigation. thanks, John Hubbard> > Thanks, > Martin > > [1] http://fs.mupuf.org/nvidia/fan_calib/pwm_offset.png >
Martin Peres
2018-Jan-28 23:24 UTC
[Nouveau] Addressing the problem of noisy GPUs under Nouveau
On 28/11/17 07:32, John Hubbard wrote:> On 11/23/2017 02:48 PM, Martin Peres wrote: >> On 23/11/17 10:06, John Hubbard wrote: >>> On 11/22/2017 05:07 PM, Martin Peres wrote: >>>> Hey, >>>> >>>> Thanks for your answer, Andy! >>>> >>>> On 22/11/17 04:06, Ilia Mirkin wrote: >>>>> On Tue, Nov 21, 2017 at 8:29 PM, Andy Ritger <aritger at nvidia.com> wrote: >>>>> Martin's question was very long, but it boils down to this: >>>>> >>>>> How do we compute the correct values to write into the e114/e118 pwm >>>>> registers based on the VBIOS contents and current state of the board >>>>> (like temperature). >>>> >>>> Unfortunately, it can also be the e11c/e120 couple, or 0x200d8/dc on >>>> GF119+, or 0x200cd/d0 on Kepler+. >>>> >>>> At least, it looks like we know which PWM controler we need to drive, so >>>> I did not want to muddy the water even more by giving register >>>> addresses, rather concentrating on the problem at hand: How to compute >>>> the duty value for the PWM controler. >>>> >>>>> >>>>> We generally do this right, but appear to get it extra-wrong for certain GPUs. >>>> >>>> Yes... So far, we are always safe, but users tend to mind when their >>>> computer sound like a jumbo jet at take off... Who would have thought? :D >>>> >>>> Anyway, looking forward to your answer! >>>> >>>> Cheers, >>>> Martin >>> >>> >>> Hi Martin, >>> >>> One of our firmware engineers thinks that this looks a lot like PWM inversion. >>> For some SKUs, the interpretation of the PWM duty cycle is inverted. That >>> would probably make it *very* difficult to find a sensible algorithm that >>> covered all the SKUs, given that some are inverted and others are not. >>> >>> For the noisy GPUs, a very useful experiment would be to try inverting it, >>> like this: >>> >>> pwmDutyCycle = pwmPeriod - pwmDutyCycle; >>> >>> ...and then see if fan control starts behaving closer to how you've actually >>> programmed it. >>> >>> Would that be easy enough to try out? It should help narrow down the >>> problem at least. >>> >> >> Hey John, >> >> Unfortunately, we know about PWM inversion, and one can know which mode >> to use based on the GPIO entry associated to the fan (inverted). We have >> had support for this in Nouveau for a long time. At the very least, this >> is not the problem on my GF108. >> >> I am certain that the problem I am seeing is related to this vbios table >> I wrote about (BIT P, offset 0x18). It is used to compute what PWM duty >> I should use for both 0 and 100% of the fan speed. >> >> Computing the value for 0% fan speed is difficult because of >> non-continuous nature of some of the functions[1], but I can always >> over-approximate. However, I failed to accurately compute the duty I >> need to write to get the 100% fan speed (I have cases where I greatly >> over-estimate it...). >> >> Could you please check out the vbios table I am pointing at? I am quite >> sure that your documentation will be clearer than my babbling :D > > Yes. We will check on this. There has been some productive discussion > internally, but it will take some more investigation. > > thanks, > John HubbardHave the productive discussions panned out? Thanks in advance, Martin
Possibly Parallel Threads
- Addressing the problem of noisy GPUs under Nouveau
- [PATCH][next] nouveau/gsp: replace zero-length array with flex-array member and use __counted_by
- [PATCH] [v4] nouveau: add command-line GSP-RM registry support
- [PATCH][next] nouveau/gsp: replace zero-length array with flex-array member and use __counted_by
- [PATCH] [v2] nouveau: add command-line GSP-RM registry support