Mark Menzynski
2019-Sep-06 11:43 UTC
[Nouveau] [PATCH v3] clk: Restore BYPASS_PLL_CHECK from PLLs
I have looked at problem with Fermi GPUs where changing to higher clock led to really bad perfomance (with GpuTest 20x worse perfomance) and later also crashes of the nouveau. It seemed to be affected by Shader Clock in Voltage Entries in the video BIOS. Disabling BYPASS_PLL_CHECK in CLK0_CTRL seems to completely fix the issue. I have tried to search this BYPASS_PLL_CHECK in Nvidia traces but seemed it wasn't used nowhere for CLK settings. Removing this works fine, but I don't know what it's really for. Actual bit setting this BYPASS_PLL_CHECK is on 0x10: lookup -ac0 0x137000 0x10 PCLOCK.CLK0_CTRL => { BYPASS_PLL_CHECK | UNK12 = 0 } Also, disabling this bit on other CLKs doesn't seem to break anything. v2: add back PLL lock test v3: add restoring original value after PLL lock test Signed-off-by: Mark Menzynski <mmenzyns at redhat.com> --- drm/nouveau/nvkm/subdev/clk/gf100.c | 7 +++++-- drm/nouveau/nvkm/subdev/clk/gk104.c | 5 ++++- 2 files changed, 9 insertions(+), 3 deletions(-) diff --git a/drm/nouveau/nvkm/subdev/clk/gf100.c b/drm/nouveau/nvkm/subdev/clk/gf100.c index 7f67f9f5..a97af0e9 100644 --- a/drm/nouveau/nvkm/subdev/clk/gf100.c +++ b/drm/nouveau/nvkm/subdev/clk/gf100.c @@ -368,6 +368,7 @@ gf100_clk_prog_2(struct gf100_clk *clk, int idx) struct gf100_clk_info *info = &clk->eng[idx]; struct nvkm_device *device = clk->base.subdev.device; const u32 addr = 0x137000 + (idx * 0x20); + bool bypass_state = false; if (idx <= 7) { nvkm_mask(device, addr + 0x00, 0x00000004, 0x00000000); nvkm_mask(device, addr + 0x00, 0x00000001, 0x00000000); @@ -376,12 +377,14 @@ gf100_clk_prog_2(struct gf100_clk *clk, int idx) nvkm_mask(device, addr + 0x00, 0x00000001, 0x00000001); /* Test PLL lock */ + bypass_state = nvkm_rd32(device, addr + 0x00) & 0x00000010; nvkm_mask(device, addr + 0x00, 0x00000010, 0x00000000); nvkm_msec(device, 2000, if (nvkm_rd32(device, addr + 0x00) & 0x00020000) break; ); - nvkm_mask(device, addr + 0x00, 0x00000010, 0x00000010); + if (bypass_state) + nvkm_mask(device, addr + 0x00, 0x00000010, 0x00000010); /* Enable sync mode */ nvkm_mask(device, addr + 0x00, 0x00000004, 0x00000004); @@ -476,5 +479,5 @@ gf100_clk_new(struct nvkm_device *device, int index, struct nvkm_clk **pclk) return -ENOMEM; *pclk = &clk->base; - return nvkm_clk_ctor(&gf100_clk, device, index, false, &clk->base); + return nvkm_clk_ctor(&gf100_clk, device, index, true, &clk->base); } diff --git a/drm/nouveau/nvkm/subdev/clk/gk104.c b/drm/nouveau/nvkm/subdev/clk/gk104.c index 0b37e3da..c9ede404 100644 --- a/drm/nouveau/nvkm/subdev/clk/gk104.c +++ b/drm/nouveau/nvkm/subdev/clk/gk104.c @@ -388,6 +388,7 @@ gk104_clk_prog_2(struct gk104_clk *clk, int idx) struct gk104_clk_info *info = &clk->eng[idx]; struct nvkm_device *device = clk->base.subdev.device; const u32 addr = 0x137000 + (idx * 0x20); + bool bypass_state = false; nvkm_mask(device, addr + 0x00, 0x00000004, 0x00000000); nvkm_mask(device, addr + 0x00, 0x00000001, 0x00000000); if (info->coef) { @@ -395,12 +396,14 @@ gk104_clk_prog_2(struct gk104_clk *clk, int idx) nvkm_mask(device, addr + 0x00, 0x00000001, 0x00000001); /* Test PLL lock */ + bypass_state = nvkm_rd32(device, addr + 0x00); nvkm_mask(device, addr + 0x00, 0x00000010, 0x00000000); nvkm_msec(device, 2000, if (nvkm_rd32(device, addr + 0x00) & 0x00020000) break; ); - nvkm_mask(device, addr + 0x00, 0x00000010, 0x00000010); + if (bypass_state) + nvkm_mask(device, addr + 0x00, 0x00000010, 0x00000010); /* Enable sync mode */ nvkm_mask(device, addr + 0x00, 0x00000004, 0x00000004); -- 2.21.0
Roy Spliet
2019-Sep-06 16:24 UTC
[Nouveau] [PATCH v3] clk: Restore BYPASS_PLL_CHECK from PLLs
(Accidentally replied in person instead of to the list. Here a ~verbatim copy of my e-mail earlier) Briefly, the BYPASS_PLL_CHECK bit disables the PLL locking test when set to 1. PLLs should still lock (or not, if the params are wildly out-of-bounds), but the test is bypassed and disabled when the bit is set. This bit was definitely used on NVA3/5/8 by the blob as I confirmed ploughing through traces. I recall seeing the same behaviour (disabling) on early Fermi's, but I'd have to dig a whole lot deeper through old trace to confirm. As for why NVIDIA disables this test after verifying the PLL locked, my highly speculative guess back then was just power savings. As to why nouveau does it: when in Rome... The fact that this makes a difference in stability on GF119 makes me wonder whether the PLL control register has a different layout on these GPUs. The BYPASS_PLL_CHECK bit might no longer be a BYPASS_PLL_CHECK bit, but have a completely different meaning. As this gen was centimetring towards Kepler, you might want to: 1) investigate behaviour of this bit and the rest of the control bits on an idle GF119. Got a machine with an Intel/AMD IGP? Good, stick the GF119 in as a secondary GPU and you can trace while messing with the VDEC PLL or some other insignificant domain's PLL manually without hanging the card or your machine, 2) Look at Kepler code either in nouveau or nvgpu to see if their PLL reconfiguration code matches your traces closer, 3) Be cheeky and ask NVIDIA. Depending on further investigation you may have to create a new clk/gf119.c subdev to facilitate potential differences between earlier Fermis and this one. Roy On 06/09/2019 12:43, Mark Menzynski wrote:> I have looked at problem with Fermi GPUs where changing to higher clock > led to really bad perfomance (with GpuTest 20x worse perfomance) and later also crashes of the nouveau. It seemed > to be affected by Shader Clock in Voltage Entries in the video BIOS. Disabling > BYPASS_PLL_CHECK in CLK0_CTRL seems to completely fix the issue. I have > tried to search this BYPASS_PLL_CHECK in Nvidia traces but seemed it > wasn't used nowhere for CLK settings. > > Removing this works fine, but I don't know what it's really for. > Actual bit setting this BYPASS_PLL_CHECK is on 0x10: > lookup -ac0 0x137000 0x10 > PCLOCK.CLK0_CTRL => { BYPASS_PLL_CHECK | UNK12 = 0 } > Also, disabling this bit on other CLKs doesn't seem to break anything. > > v2: add back PLL lock test > v3: add restoring original value after PLL lock test > > Signed-off-by: Mark Menzynski <mmenzyns at redhat.com> > --- > drm/nouveau/nvkm/subdev/clk/gf100.c | 7 +++++-- > drm/nouveau/nvkm/subdev/clk/gk104.c | 5 ++++- > 2 files changed, 9 insertions(+), 3 deletions(-) > > diff --git a/drm/nouveau/nvkm/subdev/clk/gf100.c b/drm/nouveau/nvkm/subdev/clk/gf100.c > index 7f67f9f5..a97af0e9 100644 > --- a/drm/nouveau/nvkm/subdev/clk/gf100.c > +++ b/drm/nouveau/nvkm/subdev/clk/gf100.c > @@ -368,6 +368,7 @@ gf100_clk_prog_2(struct gf100_clk *clk, int idx) > struct gf100_clk_info *info = &clk->eng[idx]; > struct nvkm_device *device = clk->base.subdev.device; > const u32 addr = 0x137000 + (idx * 0x20); > + bool bypass_state = false; > if (idx <= 7) { > nvkm_mask(device, addr + 0x00, 0x00000004, 0x00000000); > nvkm_mask(device, addr + 0x00, 0x00000001, 0x00000000); > @@ -376,12 +377,14 @@ gf100_clk_prog_2(struct gf100_clk *clk, int idx) > nvkm_mask(device, addr + 0x00, 0x00000001, 0x00000001); > > /* Test PLL lock */ > + bypass_state = nvkm_rd32(device, addr + 0x00) & 0x00000010; > nvkm_mask(device, addr + 0x00, 0x00000010, 0x00000000); > nvkm_msec(device, 2000, > if (nvkm_rd32(device, addr + 0x00) & 0x00020000) > break; > ); > - nvkm_mask(device, addr + 0x00, 0x00000010, 0x00000010); > + if (bypass_state) > + nvkm_mask(device, addr + 0x00, 0x00000010, 0x00000010); > > /* Enable sync mode */ > nvkm_mask(device, addr + 0x00, 0x00000004, 0x00000004); > @@ -476,5 +479,5 @@ gf100_clk_new(struct nvkm_device *device, int index, struct nvkm_clk **pclk) > return -ENOMEM; > *pclk = &clk->base; > > - return nvkm_clk_ctor(&gf100_clk, device, index, false, &clk->base); > + return nvkm_clk_ctor(&gf100_clk, device, index, true, &clk->base); > } > diff --git a/drm/nouveau/nvkm/subdev/clk/gk104.c b/drm/nouveau/nvkm/subdev/clk/gk104.c > index 0b37e3da..c9ede404 100644 > --- a/drm/nouveau/nvkm/subdev/clk/gk104.c > +++ b/drm/nouveau/nvkm/subdev/clk/gk104.c > @@ -388,6 +388,7 @@ gk104_clk_prog_2(struct gk104_clk *clk, int idx) > struct gk104_clk_info *info = &clk->eng[idx]; > struct nvkm_device *device = clk->base.subdev.device; > const u32 addr = 0x137000 + (idx * 0x20); > + bool bypass_state = false; > nvkm_mask(device, addr + 0x00, 0x00000004, 0x00000000); > nvkm_mask(device, addr + 0x00, 0x00000001, 0x00000000); > if (info->coef) { > @@ -395,12 +396,14 @@ gk104_clk_prog_2(struct gk104_clk *clk, int idx) > nvkm_mask(device, addr + 0x00, 0x00000001, 0x00000001); > > /* Test PLL lock */ > + bypass_state = nvkm_rd32(device, addr + 0x00); > nvkm_mask(device, addr + 0x00, 0x00000010, 0x00000000); > nvkm_msec(device, 2000, > if (nvkm_rd32(device, addr + 0x00) & 0x00020000) > break; > ); > - nvkm_mask(device, addr + 0x00, 0x00000010, 0x00000010); > + if (bypass_state) > + nvkm_mask(device, addr + 0x00, 0x00000010, 0x00000010); > > /* Enable sync mode */ > nvkm_mask(device, addr + 0x00, 0x00000004, 0x00000004); >