Ilia Mirkin
2015-Sep-07 20:00 UTC
[Nouveau] [PATCH mesa 3/3] nv30: Disable msaa for now because it causes lockups
On Mon, Sep 7, 2015 at 3:50 PM, Hans de Goede <hdegoede at redhat.com> wrote:> msaa use on nv30 may trigger a (mesa?) bug where dmesg says: > [ 1197.850642] nouveau E[soffice.bin[3785]] fail ttm_validate > [ 1197.850648] nouveau E[soffice.bin[3785]] validating bo list > [ 1197.850654] nouveau E[soffice.bin[3785]] validate: -12 > [ 1201.766955] nouveau E[soffice.bin[3785]] fail ttm_validate > [ 1201.766961] nouveau E[soffice.bin[3785]] validating bo list > [ 1201.766968] nouveau E[soffice.bin[3785]] validate: -12 > > After which the program using the msaa visual freezes, and eventually > the entire system freezes. Disable msaa until this is fixed. > > This happens on both nv3x and nv4x cards.Ugh. This is aka "you ran out of vram, goodbye". We don't really handle that case extremely well. I feel really bad doing this :( The issue is anachronistic applications like soffice that don't keep with the limitations of GPUs of the days of yore. So we end up penalizing people who do use applications of the day. But the practical issue is that people do upgrade, and people do run these applications, and so it makes sense to keep it off by default. Could I convince you to use debug_get_int_option (or something along those lines, forget the function name) to still allow an env var override? Like NV30_MAX_MSAA or something (and clamp it to 4 so people don't get ideas).> > Signed-off-by: Hans de Goede <hdegoede at redhat.com> > --- > src/gallium/drivers/nouveau/nv30/nv30_screen.c | 17 +++++++++++++++++ > 1 file changed, 17 insertions(+) > > diff --git a/src/gallium/drivers/nouveau/nv30/nv30_screen.c b/src/gallium/drivers/nouveau/nv30/nv30_screen.c > index 7aad26b..7a16e72 100644 > --- a/src/gallium/drivers/nouveau/nv30/nv30_screen.c > +++ b/src/gallium/drivers/nouveau/nv30/nv30_screen.c > @@ -319,8 +319,25 @@ nv30_screen_is_format_supported(struct pipe_screen *pscreen, > unsigned sample_count, > unsigned bindings) > { > + /* > + * msaa use on nv30 may trigger a (mesa?) bug where dmesg says: > + * [ 1197.850642] nouveau E[soffice.bin[3785]] fail ttm_validate > + * [ 1197.850648] nouveau E[soffice.bin[3785]] validating bo list > + * [ 1197.850654] nouveau E[soffice.bin[3785]] validate: -12 > + * [ 1201.766955] nouveau E[soffice.bin[3785]] fail ttm_validate > + * [ 1201.766961] nouveau E[soffice.bin[3785]] validating bo list > + * [ 1201.766968] nouveau E[soffice.bin[3785]] validate: -12 > + * > + * After which the program using the msaa visual freezes, and eventually > + * the entire system freezes. Disable msaa until this is fixed. > + */ > +#if 1 > + if (sample_count > 0) > +#else > if (sample_count > 4) > +#endif > return false; > + > if (!(0x00000017 & (1 << sample_count))) > return false; > > -- > 2.4.3 >
Hans de Goede
2015-Sep-08 08:00 UTC
[Nouveau] [PATCH mesa 3/3] nv30: Disable msaa for now because it causes lockups
Hi, On 07-09-15 22:00, Ilia Mirkin wrote:> On Mon, Sep 7, 2015 at 3:50 PM, Hans de Goede <hdegoede at redhat.com> wrote: >> msaa use on nv30 may trigger a (mesa?) bug where dmesg says: >> [ 1197.850642] nouveau E[soffice.bin[3785]] fail ttm_validate >> [ 1197.850648] nouveau E[soffice.bin[3785]] validating bo list >> [ 1197.850654] nouveau E[soffice.bin[3785]] validate: -12 >> [ 1201.766955] nouveau E[soffice.bin[3785]] fail ttm_validate >> [ 1201.766961] nouveau E[soffice.bin[3785]] validating bo list >> [ 1201.766968] nouveau E[soffice.bin[3785]] validate: -12 >> >> After which the program using the msaa visual freezes, and eventually >> the entire system freezes. Disable msaa until this is fixed. >> >> This happens on both nv3x and nv4x cards. > > Ugh. This is aka "you ran out of vram, goodbye".Ah right 12 == ENOMEM. This also explains why I can reproduce this much easier on a 64 MB nv34 card then on my nv46 card which has more RAM.> We don't really > handle that case extremely well. I feel really bad doing this :( The > issue is anachronistic applications like soffice that don't keep with > the limitations of GPUs of the days of yore. So we end up penalizing > people who do use applications of the day. > > But the practical issue is that people do upgrade, and people do run > these applications, and so it makes sense to keep it off by default. > Could I convince you to use debug_get_int_option (or something along > those lines, forget the function name) to still allow an env var > override? Like NV30_MAX_MSAA or something (and clamp it to 4 so people > don't get ideas).Using debug_get_int_option and defaulting it to 0 sounds like a good idea, I'll do a v2 using this, and I'll update the comment / commit message to reflect that this is caused by the applications causing us to go oom. Regards, Hans
Ben Skeggs
2015-Sep-08 08:48 UTC
[Nouveau] [PATCH mesa 3/3] nv30: Disable msaa for now because it causes lockups
On 8 September 2015 at 18:00, Hans de Goede <hdegoede at redhat.com> wrote:> Hi, > > On 07-09-15 22:00, Ilia Mirkin wrote: >> >> On Mon, Sep 7, 2015 at 3:50 PM, Hans de Goede <hdegoede at redhat.com> wrote: >>> >>> msaa use on nv30 may trigger a (mesa?) bug where dmesg says: >>> [ 1197.850642] nouveau E[soffice.bin[3785]] fail ttm_validate >>> [ 1197.850648] nouveau E[soffice.bin[3785]] validating bo list >>> [ 1197.850654] nouveau E[soffice.bin[3785]] validate: -12 >>> [ 1201.766955] nouveau E[soffice.bin[3785]] fail ttm_validate >>> [ 1201.766961] nouveau E[soffice.bin[3785]] validating bo list >>> [ 1201.766968] nouveau E[soffice.bin[3785]] validate: -12 >>> >>> After which the program using the msaa visual freezes, and eventually >>> the entire system freezes. Disable msaa until this is fixed. >>> >>> This happens on both nv3x and nv4x cards. >> >> >> Ugh. This is aka "you ran out of vram, goodbye". > > > Ah right 12 == ENOMEM. This also explains why I can reproduce this much > easier on a 64 MB nv34 card then on my nv46 card which has more RAM.I wonder if we can "solve" this one by flushing more often etc in the 3D driver, it's really hard to say without knowing the (set of) operation(s) that the kernel is rejecting though...> >> We don't really >> handle that case extremely well. I feel really bad doing this :( The >> issue is anachronistic applications like soffice that don't keep with >> the limitations of GPUs of the days of yore. So we end up penalizing >> people who do use applications of the day. >> >> But the practical issue is that people do upgrade, and people do run >> these applications, and so it makes sense to keep it off by default. >> Could I convince you to use debug_get_int_option (or something along >> those lines, forget the function name) to still allow an env var >> override? Like NV30_MAX_MSAA or something (and clamp it to 4 so people >> don't get ideas). > > > Using debug_get_int_option and defaulting it to 0 sounds like a good > idea, I'll do a v2 using this, and I'll update the comment / commit > message to reflect that this is caused by the applications causing > us to go oom. > > Regards, > > Hans > > _______________________________________________ > Nouveau mailing list > Nouveau at lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/nouveau