Kees Cook
2020-Nov-24 21:32 UTC
[Nouveau] [Intel-wired-lan] [PATCH 000/141] Fix fall-through warnings for Clang
On Mon, Nov 23, 2020 at 08:31:30AM -0800, James Bottomley wrote:> Really, no ... something which produces no improvement has no value at > all ... we really shouldn't be wasting maintainer time with it because > it has a cost to merge. I'm not sure we understand where the balance > lies in value vs cost to merge but I am confident in the zero value > case.What? We can't measure how many future bugs aren't introduced because the kernel requires explicit case flow-control statements for all new code. We already enable -Wimplicit-fallthrough globally, so that's not the discussion. The issue is that Clang is (correctly) even more strict than GCC for this, so these are the remaining ones to fix for full Clang coverage too. People have spent more time debating this already than it would have taken to apply the patches. :) This is about robustness and language wrangling. It's a big code-base, and this is the price of our managing technical debt for permanent robustness improvements. (The numbers I ran from Gustavo's earlier patches were that about 10% of the places adjusted were identified as legitimate bugs being fixed. This final series may be lower, but there are still bugs being found from it -- we need to finish this and shut the door on it for good.) -- Kees Cook
James Bottomley
2020-Nov-25 07:05 UTC
[Nouveau] [Intel-wired-lan] [PATCH 000/141] Fix fall-through warnings for Clang
On Tue, 2020-11-24 at 13:32 -0800, Kees Cook wrote:> On Mon, Nov 23, 2020 at 08:31:30AM -0800, James Bottomley wrote: > > Really, no ... something which produces no improvement has no value > > at all ... we really shouldn't be wasting maintainer time with it > > because it has a cost to merge. I'm not sure we understand where > > the balance lies in value vs cost to merge but I am confident in > > the zero value case. > > What? We can't measure how many future bugs aren't introduced because > the kernel requires explicit case flow-control statements for all new > code.No but we can measure how vulnerable our current coding habits are to the mistake this warning would potentially prevent. I don't think it's wrong to extrapolate that if we had no instances at all of prior coding problems we likely wouldn't have any in future either making adopting the changes needed to enable the warning valueless ... that's the zero value case I was referring to above. Now, what we have seems to be about 6 cases (at least what's been shown in this thread) where a missing break would cause potentially user visible issues. That means the value of this isn't zero, but it's not a no-brainer massive win either. That's why I think asking what we've invested vs the return isn't a useless exercise.> We already enable -Wimplicit-fallthrough globally, so that's not the > discussion. The issue is that Clang is (correctly) even more strict > than GCC for this, so these are the remaining ones to fix for full > Clang coverage too. > > People have spent more time debating this already than it would have > taken to apply the patches. :)You mean we've already spent 90% of the effort to come this far so we might as well go the remaining 10% because then at least we get some return? It's certainly a clinching argument in defence procurement ...> This is about robustness and language wrangling. It's a big code- > base, and this is the price of our managing technical debt for > permanent robustness improvements. (The numbers I ran from Gustavo's > earlier patches were that about 10% of the places adjusted were > identified as legitimate bugs being fixed. This final series may be > lower, but there are still bugs being found from it -- we need to > finish this and shut the door on it for good.)I got my six patches by analyzing the lwn.net report of the fixes that was cited which had 21 of which 50% didn't actually change the emitted code, and 25% didn't have a user visible effect. But the broader point I'm making is just because the compiler people come up with a shiny new warning doesn't necessarily mean the problem it's detecting is one that causes us actual problems in the code base. I'd really be happier if we had a theory about what classes of CVE or bug we could eliminate before we embrace the next new warning. James
Nick Desaulniers
2020-Nov-25 12:24 UTC
[Nouveau] [Intel-wired-lan] [PATCH 000/141] Fix fall-through warnings for Clang
On Tue, Nov 24, 2020 at 11:05 PM James Bottomley <James.Bottomley at hansenpartnership.com> wrote:> > On Tue, 2020-11-24 at 13:32 -0800, Kees Cook wrote: > > We already enable -Wimplicit-fallthrough globally, so that's not the > > discussion. The issue is that Clang is (correctly) even more strict > > than GCC for this, so these are the remaining ones to fix for full > > Clang coverage too. > > > > People have spent more time debating this already than it would have > > taken to apply the patches. :) > > You mean we've already spent 90% of the effort to come this far so we > might as well go the remaining 10% because then at least we get some > return? It's certainly a clinching argument in defence procurement ...So developers and distributions using Clang can't have -Wimplicit-fallthrough enabled because GCC is less strict (which has been shown in this thread to lead to bugs)? We'd like to have nice things too, you know. I even agree that most of the churn comes from case 0: ++x; default: break; which I have a patch for: https://reviews.llvm.org/D91895. I agree that can never lead to bugs. But that's not the sole case of this series, just most of them. Though, note how the reviewer (C++ spec editor and clang front end owner) in https://reviews.llvm.org/D91895 even asks in that review how maybe a new flag would be more appropriate for a watered down/stylistic variant of the existing behavior. And if the current wording of Documentation/process/deprecated.rst around "fallthrough" is a straightforward rule of thumb, I kind of agree with him.> > > This is about robustness and language wrangling. It's a big code- > > base, and this is the price of our managing technical debt for > > permanent robustness improvements. (The numbers I ran from Gustavo's > > earlier patches were that about 10% of the places adjusted were > > identified as legitimate bugs being fixed. This final series may be > > lower, but there are still bugs being found from it -- we need to > > finish this and shut the door on it for good.) > > I got my six patches by analyzing the lwn.net report of the fixes that > was cited which had 21 of which 50% didn't actually change the emitted > code, and 25% didn't have a user visible effect. > > But the broader point I'm making is just because the compiler people > come up with a shiny new warning doesn't necessarily mean the problemThat's not what this is though; you're attacking a strawman. I'd encourage you to bring that up when that actually occurs, unlike this case since it's actively hindering getting -Wimplicit-fallthrough enabled for Clang. This is not a shiny new warning; it's already on for GCC and has existed in both compilers for multiple releases. And I'll also note that warnings are warnings and not errors because they cannot be proven to be bugs in 100% of cases, but they have led to bugs in the past. They require a human to review their intent and remove ambiguities. If 97% of cases would end in a break ("Expert C Programming: Deep C Secrets" - Peter van der Linden), then it starts to look to me like a language defect; certainly an incorrectly chosen default. But the compiler can't know those 3% were intentional, unless you're explicit for those exceptional cases.> it's detecting is one that causes us actual problems in the code base. > I'd really be happier if we had a theory about what classes of CVE or > bug we could eliminate before we embrace the next new warning.We don't generally file CVEs and waiting for them to occur might be too reactive, but I agree that pointing to some additional documentation in commit messages about how a warning could lead to a bug would make it clearer to reviewers why being able to enable it treewide, even if there's no bug in their particular subsystem, is in the general interest of the commons. On Mon, Nov 23, 2020 at 7:58 AM James Bottomley <James.Bottomley at hansenpartnership.com> wrote:> > We're also complaining about the inability to recruit maintainers: > > https://www.theregister.com/2020/06/30/hard_to_find_linux_maintainers_says_torvalds/ > > And burn out: > > http://antirez.com/news/129 > > The whole crux of your argument seems to be maintainers' time isn't > important so we should accept all trivial patches ... I'm pushing back > on that assumption in two places, firstly the valulessness of the time > and secondly that all trivial patches are valuable.It's critical to the longevity of any open source project that there are not single points of failure. If someone is not expendable or replaceable (or claims to be) then that's a risk to the project and a bottleneck. Not having a replacement in training or some form of redundancy is short sighted. If trivial patches are adding too much to your workload, consider training a co-maintainer or asking for help from one of your reviewers whom you trust. I don't doubt it's hard to find maintainers, but existing maintainers should go out of their way to entrust co-maintainers especially when they find their workload becomes too high. And reviewing/picking up trivial patches is probably a great way to get started. If we allow too much knowledge of any one subsystem to collect with one maintainer, what happens when that maintainer leaves the community (which, given a finite lifespan, is an inevitability)? -- Thanks, ~Nick Desaulniers
Kees Cook
2020-Nov-25 21:10 UTC
[Nouveau] [Intel-wired-lan] [PATCH 000/141] Fix fall-through warnings for Clang
On Tue, Nov 24, 2020 at 11:05:35PM -0800, James Bottomley wrote:> Now, what we have seems to be about 6 cases (at least what's been shown > in this thread) where a missing break would cause potentially user > visible issues. That means the value of this isn't zero, but it's not > a no-brainer massive win either. That's why I think asking what we've > invested vs the return isn't a useless exercise.The number is much higher[1]. If it were 6 in the entire history of the kernel, I would agree with you. :) Some were fixed _before_ Gustavo's effort too, which I also count towards the idea of "this is a dangerous weakness in C, and now we have stopped it forever."> But the broader point I'm making is just because the compiler people > come up with a shiny new warning doesn't necessarily mean the problem > it's detecting is one that causes us actual problems in the code base. > I'd really be happier if we had a theory about what classes of CVE or > bug we could eliminate before we embrace the next new warning.But we did! It was long ago justified and documented[2], and even links to the CWE[3] for it. This wasn't random joy over discovering a new warning we could turn on, this was turning on a warning that the compiler folks finally gave us to handle an entire class of flaws. If we need to update the code-base to address it not a useful debate -- that was settled already, even if you're only discovering it now. :P. This last patch set is about finishing that work for Clang, which is correctly even more strict than GCC. -Kees [1] https://outflux.net/slides/2019/lss/kspp.pdf calls out specific numbers (about 6.5% of the patches fixed missing breaks): v4.19: 3 of 129 v4.20: 2 of 59 v5.0: 3 of 56 v5.1: 10 of 100 v5.2: 6 of 71 v5.3: 7 of 69 And in the history of the kernel, it's been an ongoing source of flaws: $ l --no-merges | grep -i 'missing break' | wc -l 185 The frequency of such errors being "naturally" found was pretty steady until the static checkers started warning, and then it was on the rise, but the full effort flushed the rest out, and now it's dropped to almost zero: 1 v2.6.12 3 v2.6.16.28 1 v2.6.17 1 v2.6.19 2 v2.6.21 1 v2.6.22 3 v2.6.24 3 v2.6.29 1 v2.6.32 1 v2.6.33 1 v2.6.35 4 v2.6.36 3 v2.6.38 2 v2.6.39 7 v3.0 2 v3.1 2 v3.2 2 v3.3 3 v3.4 1 v3.5 8 v3.6 7 v3.7 3 v3.8 6 v3.9 3 v3.10 2 v3.11 5 v3.12 5 v3.13 2 v3.14 4 v3.15 2 v3.16 3 v3.17 2 v3.18 2 v3.19 1 v4.0 2 v4.1 5 v4.2 4 v4.5 5 v4.7 6 v4.8 1 v4.9 3 v4.10 2 v4.11 6 v4.12 3 v4.13 2 v4.14 5 v4.15 2 v4.16 7 v4.18 2 v4.19 6 v4.20 3 v5.0 12 v5.1 3 v5.2 4 v5.3 2 v5.4 1 v5.8 And the reason it's fully zero, is because we still have the cases we're cleaning up right now. Even this last one from v5.8 is specifically of the same type this series addresses: case 4: color_index = TrueCModeIndex; + break; default: return; } [2] https://www.kernel.org/doc/html/latest/process/deprecated.html#implicit-switch-case-fall-through All switch/case blocks must end in one of: break; fallthrough; continue; goto <label>; return [expression]; [3] https://cwe.mitre.org/data/definitions/484.html -- Kees Cook