Ilia Mirkin
2015-May-20 20:01 UTC
[Nouveau] [PATCH] ram/gf100-: error out if a ridiculous amount of vram is detected
Some newer chips have trouble coming up, and we get bad MMIO reads from them, like 0xbadf100. This ends up translating into crazy amounts of VRAM, which destroys all sorts of other logic down the line. Instead, fail device init. Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu> Cc: stable at kernel.org --- drm/nouveau/nvkm/subdev/fb/ramgf100.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/drm/nouveau/nvkm/subdev/fb/ramgf100.c b/drm/nouveau/nvkm/subdev/fb/ramgf100.c index de9f395..9d4d196 100644 --- a/drm/nouveau/nvkm/subdev/fb/ramgf100.c +++ b/drm/nouveau/nvkm/subdev/fb/ramgf100.c @@ -545,6 +545,12 @@ gf100_ram_create_(struct nvkm_object *parent, struct nvkm_object *engine, } } + /* if over 1TB of VRAM is reported, something went very wrong, bail */ + if (ram->size > (1ULL << 40)) { + nv_error(pfb, "invalid vram size: %llx\n", ram->size); + return -EINVAL; + } + /* if all controllers have the same amount attached, there's no holes */ if (uniform) { offset = rsvd_head; -- 2.3.6
Tobias Klausmann
2015-May-20 21:35 UTC
[Nouveau] [PATCH] ram/gf100-: error out if a ridiculous amount of vram is detected
Any idea on how to solve the problem. other than just reporting it? But for now this adds a helpful error message... you may add my R-b. On 20.05.2015 22:01, Ilia Mirkin wrote:> Some newer chips have trouble coming up, and we get bad MMIO reads from > them, like 0xbadf100. This ends up translating into crazy amounts of > VRAM, which destroys all sorts of other logic down the line. Instead, > fail device init. > > Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu> > Cc: stable at kernel.org > --- > drm/nouveau/nvkm/subdev/fb/ramgf100.c | 6 ++++++ > 1 file changed, 6 insertions(+) > > diff --git a/drm/nouveau/nvkm/subdev/fb/ramgf100.c b/drm/nouveau/nvkm/subdev/fb/ramgf100.c > index de9f395..9d4d196 100644 > --- a/drm/nouveau/nvkm/subdev/fb/ramgf100.c > +++ b/drm/nouveau/nvkm/subdev/fb/ramgf100.c > @@ -545,6 +545,12 @@ gf100_ram_create_(struct nvkm_object *parent, struct nvkm_object *engine, > } > } > > + /* if over 1TB of VRAM is reported, something went very wrong, bail */ > + if (ram->size > (1ULL << 40)) { > + nv_error(pfb, "invalid vram size: %llx\n", ram->size); > + return -EINVAL; > + } > + > /* if all controllers have the same amount attached, there's no holes */ > if (uniform) { > offset = rsvd_head;
Ilia Mirkin
2015-May-20 21:47 UTC
[Nouveau] [PATCH] ram/gf100-: error out if a ridiculous amount of vram is detected
Someone will have to trudge through a mmiotrace and figure out what magic bit we need to set in order to bring it out of deep sleep. Or perhaps NVIDIA will graciously tell us, which they eventually did for GK104/GK106 (but their instructions appear to be insufficient for at least some GK106's). But I've seen these errors every so often on various cards... we stick things at the end of VRAM, which causes no end of confusion when we think that it's a few PB out :) On Wed, May 20, 2015 at 5:35 PM, Tobias Klausmann <tobias.johannes.klausmann at mni.thm.de> wrote:> Any idea on how to solve the problem. other than just reporting it? > > But for now this adds a helpful error message... you may add my R-b. > > > On 20.05.2015 22:01, Ilia Mirkin wrote: >> >> Some newer chips have trouble coming up, and we get bad MMIO reads from >> them, like 0xbadf100. This ends up translating into crazy amounts of >> VRAM, which destroys all sorts of other logic down the line. Instead, >> fail device init. >> >> Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu> >> Cc: stable at kernel.org >> --- >> drm/nouveau/nvkm/subdev/fb/ramgf100.c | 6 ++++++ >> 1 file changed, 6 insertions(+) >> >> diff --git a/drm/nouveau/nvkm/subdev/fb/ramgf100.c >> b/drm/nouveau/nvkm/subdev/fb/ramgf100.c >> index de9f395..9d4d196 100644 >> --- a/drm/nouveau/nvkm/subdev/fb/ramgf100.c >> +++ b/drm/nouveau/nvkm/subdev/fb/ramgf100.c >> @@ -545,6 +545,12 @@ gf100_ram_create_(struct nvkm_object *parent, struct >> nvkm_object *engine, >> } >> } >> + /* if over 1TB of VRAM is reported, something went very wrong, >> bail */ >> + if (ram->size > (1ULL << 40)) { >> + nv_error(pfb, "invalid vram size: %llx\n", ram->size); >> + return -EINVAL; >> + } >> + >> /* if all controllers have the same amount attached, there's no >> holes */ >> if (uniform) { >> offset = rsvd_head; > >
Ben Skeggs
2015-May-21 04:45 UTC
[Nouveau] [PATCH] ram/gf100-: error out if a ridiculous amount of vram is detected
On 21 May 2015 at 06:01, Ilia Mirkin <imirkin at alum.mit.edu> wrote:> Some newer chips have trouble coming up, and we get bad MMIO reads from > them, like 0xbadf100. This ends up translating into crazy amounts of > VRAM, which destroys all sorts of other logic down the line. Instead, > fail device init.Hrm, I'm not sure what I think of doing something like this. Where do we draw the line at validating stuff we read from GPU registers? Either way, we still have a bug, so I'm not sure what we gain from working around it like this. Ben.> > Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu> > Cc: stable at kernel.org > --- > drm/nouveau/nvkm/subdev/fb/ramgf100.c | 6 ++++++ > 1 file changed, 6 insertions(+) > > diff --git a/drm/nouveau/nvkm/subdev/fb/ramgf100.c b/drm/nouveau/nvkm/subdev/fb/ramgf100.c > index de9f395..9d4d196 100644 > --- a/drm/nouveau/nvkm/subdev/fb/ramgf100.c > +++ b/drm/nouveau/nvkm/subdev/fb/ramgf100.c > @@ -545,6 +545,12 @@ gf100_ram_create_(struct nvkm_object *parent, struct nvkm_object *engine, > } > } > > + /* if over 1TB of VRAM is reported, something went very wrong, bail */ > + if (ram->size > (1ULL << 40)) { > + nv_error(pfb, "invalid vram size: %llx\n", ram->size); > + return -EINVAL; > + } > + > /* if all controllers have the same amount attached, there's no holes */ > if (uniform) { > offset = rsvd_head; > -- > 2.3.6 > > _______________________________________________ > Nouveau mailing list > Nouveau at lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/nouveau