After looking at the Livermore for a while, we found the issue that was causing LTO to produce a different result. Consider the code below [1]. setup() doesn't touch bar/baz, main() doesn't reference foo. LTO finds, correctly, that it can remove the setup(), but the result is different. The code is clearly wrong, but the compiler has no right to fix user's stupidity, even at that level. The only think I can think that would mitigate the problem would be to have a warning or out-of-bounds access when it's that obvious. Clang folks, Correct me it I'm wrong, but I humbly think this should be done in the front-end, so it's easy to print out line information without requiring debug symbols to be present. How easy would be to do that in Clang's AST? I assume it's only a matter of checking the stride against the bounds of the array, if all is knowns at compile time. That would also help against segmentation fault. It wouldn't, however, work with variable length arrays, but VLA tend to segfault more often than silently overwrite other global variables. cheers, --renato [1] #include <stdio.h> struct { int foo[20]; int bar[20]; int baz[20]; } S; void setup() { int i; // OVERWRITES ALL THREE ARRAYS for (i=0; i<60; i++) S.foo[i] = i; // ONLY REFERENCES FOO } // DOESN'T USE FOO int main() { int i; setup(); printf("Bar: "); for (i=0; i<20; i++) printf("%d ", S.bar[i]); printf("\nBaz: "); for (i=0; i<20; i++) printf("%d ", S.baz[i]); puts(""); return 0; } -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130108/17058d89/attachment.html>
Hi Renato, On 08/01/13 17:16, Renato Golin wrote:> After looking at the Livermore for a while, we found the issue that was causing > LTO to produce a different result. > > Consider the code below [1]. setup() doesn't touch bar/baz, main() doesn't > reference foo. LTO finds, correctly,I don't think this is correct. At the LLVM IR level it is valid to write into bar and baz by accessing off the end of foo. GlobalOpt is wrong to think that an inbounds GEP into foo cannot be referencing bar/baz; the inbounds rules only say it can't access memory outside of S. So I think this is an optimizer bug. Ciao, Duncan. that it can remove the setup(), but the> result is different. > > The code is clearly wrong, but the compiler has no right to fix user's > stupidity, even at that level. The only think I can think that would mitigate > the problem would be to have a warning or out-of-bounds access when it's that > obvious. > > Clang folks, > > Correct me it I'm wrong, but I humbly think this should be done in the > front-end, so it's easy to print out line information without requiring debug > symbols to be present. How easy would be to do that in Clang's AST? > > I assume it's only a matter of checking the stride against the bounds of the > array, if all is knowns at compile time. That would also help against > segmentation fault. > > It wouldn't, however, work with variable length arrays, but VLA tend to segfault > more often than silently overwrite other global variables. > > cheers, > --renato > > [1] > #include <stdio.h> > > struct { > int foo[20]; > int bar[20]; > int baz[20]; > } S; > > void setup() { > int i; > // OVERWRITES ALL THREE ARRAYS > for (i=0; i<60; i++) > S.foo[i] = i; // ONLY REFERENCES FOO > } > > // DOESN'T USE FOO > int main() { > int i; > setup(); > printf("Bar: "); > for (i=0; i<20; i++) > printf("%d ", S.bar[i]); > printf("\nBaz: "); > for (i=0; i<20; i++) > printf("%d ", S.baz[i]); > puts(""); > > return 0; > } >
On 1/8/2013 10:16 AM, Renato Golin wrote:> > The code is clearly wrong, but the compiler has no right to fix user's > stupidity, even at that level.It's not fixing user's stupidity. If you don't follow language rules, this is what you can get. A warning would definitely be very helpful, but suppressing optimizations due to this is not the right way to go. For the cases when the user code cannot be fixed, maybe implementing an option like "-fpessimal-aliasing" would help (i.e. assuming that everything is aliased with everything). -Krzysztof -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
Hi Duncan, Ok, I found that even if main() does reference foo, setup() still gets chopped off and the results is the unexpected: Foo: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 Bar: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Baz: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 So, while there is the issue in LTO, I still think Clang could give a warning. This is a source of buffer overflow attacks... While trying to define which pass was doing this, I commented out pass by pass in LTO and found that commenting some of them would produce a seg-fault in LLVM (haven't checked further, though), but I know that is one of: PM.add(createGlobalOptimizerPass()); PM.add(createConstantMergePass()); PM.add(createDeadArgEliminationPass()); PM.add(createInstructionCombiningPass()); // Inline small functions if (RunInliner) PM.add(createFunctionInliningPass()); PM.add(createPruneEHPass()); // Remove dead EH info. if (RunInliner) PM.add(createGlobalOptimizerPass()); PM.add(createGlobalDCEPass()); // Remove dead functions. Then I looped over all LTO passes in the populateLTOPassManager() but adding them manually (incrementally, until I passed them all) didn't allow me to reproduce the error. I'll create a bugzilla for this, and the Clang warning. If Clang folks think it's worth, they can start from there. --renato On 8 January 2013 16:20, Duncan Sands <baldrick at free.fr> wrote:> Hi Renato, > > > On 08/01/13 17:16, Renato Golin wrote: > >> After looking at the Livermore for a while, we found the issue that was >> causing >> LTO to produce a different result. >> >> Consider the code below [1]. setup() doesn't touch bar/baz, main() doesn't >> reference foo. LTO finds, correctly, >> > > I don't think this is correct. At the LLVM IR level it is valid to write > into > bar and baz by accessing off the end of foo. GlobalOpt is wrong to think > that > an inbounds GEP into foo cannot be referencing bar/baz; the inbounds rules > only > say it can't access memory outside of S. So I think this is an optimizer > bug. > > Ciao, Duncan. > > > that it can remove the setup(), but the > >> result is different. >> >> The code is clearly wrong, but the compiler has no right to fix user's >> stupidity, even at that level. The only think I can think that would >> mitigate >> the problem would be to have a warning or out-of-bounds access when it's >> that >> obvious. >> >> Clang folks, >> >> Correct me it I'm wrong, but I humbly think this should be done in the >> front-end, so it's easy to print out line information without requiring >> debug >> symbols to be present. How easy would be to do that in Clang's AST? >> >> I assume it's only a matter of checking the stride against the bounds of >> the >> array, if all is knowns at compile time. That would also help against >> segmentation fault. >> >> It wouldn't, however, work with variable length arrays, but VLA tend to >> segfault >> more often than silently overwrite other global variables. >> >> cheers, >> --renato >> >> [1] >> #include <stdio.h> >> >> struct { >> int foo[20]; >> int bar[20]; >> int baz[20]; >> } S; >> >> void setup() { >> int i; >> // OVERWRITES ALL THREE ARRAYS >> for (i=0; i<60; i++) >> S.foo[i] = i; // ONLY REFERENCES FOO >> } >> >> // DOESN'T USE FOO >> int main() { >> int i; >> setup(); >> printf("Bar: "); >> for (i=0; i<20; i++) >> printf("%d ", S.bar[i]); >> printf("\nBaz: "); >> for (i=0; i<20; i++) >> printf("%d ", S.baz[i]); >> puts(""); >> >> return 0; >> } >> >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130108/1da75407/attachment.html>
On 1/8/2013 10:20 AM, Duncan Sands wrote:> > I don't think this is correct. At the LLVM IR level it is valid to > write into bar and baz by accessing off the end of foo.I'd have to find the actual clauses in the standard, but I'm pretty sure that neither C nor C++ allow this. If the LLVM IR does not prohibit it, then it's likely that we lose some information in the translation process (unless it's indicated by metadata). -Krzysztof -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
On Tue, Jan 8, 2013 at 8:16 AM, Renato Golin <renato.golin at linaro.org> wrote:> After looking at the Livermore for a while, we found the issue that was > causing LTO to produce a different result. > > Consider the code below [1]. setup() doesn't touch bar/baz, main() doesn't > reference foo. LTO finds, correctly, that it can remove the setup(), but the > result is different. > > The code is clearly wrong,Pretty sure this is UB. I believe if foo/bar/baz were instead an array of arrays it might be well defined.> but the compiler has no right to fix user's > stupidity, even at that level.I'm not sure what you mean by "fix user's stupidity" here - could you clarify?> The only think I can think that would > mitigate the problem would be to have a warning or out-of-bounds access when > it's that obvious.Possible, though I don't think we have anything that goes quite that far yet (computing the dynamic range of 'i' & checking that it's beyond the bounds of the array, etc). This seems more like the purview of the Clang Static Analyzer. Have you tried it to see if it catches this case?> Clang folks, > > Correct me it I'm wrong, but I humbly think this should be done in the > front-end, so it's easy to print out line information without requiring > debug symbols to be present.Regardless of the technical issues, we generally don't emit any diagnostics from places other than the front-end for reliability reasons. You can see this in GCC's optimizer-based warnings: they're quite unreliable. Subtle changes in code lead to certain optimizations/analyses not firing & warnings appear/disappear with relatively little logic/sense to the user. In short: that option is not something that's ever really considered for Clang diagnostics.> How easy would be to do that in Clang's AST?It's more the proof necessary to compute the bounds of 'i' and access to the arrays that's the interesting part. Possible, & it wouldn't surprise me if the SA already does this work, but not something I think we'd end up doing in clang proper.> I assume it's only a matter of checking the stride against the bounds of the > array, if all is knowns at compile time. That would also help against > segmentation fault. > > It wouldn't, however, work with variable length arrays, but VLA tend to > segfault more often than silently overwrite other global variables. > > cheers, > --renato > > [1] > #include <stdio.h> > > struct { > int foo[20]; > int bar[20]; > int baz[20]; > } S; > > void setup() { > int i; > // OVERWRITES ALL THREE ARRAYS > for (i=0; i<60; i++) > S.foo[i] = i; // ONLY REFERENCES FOO > } > > // DOESN'T USE FOO > int main() { > int i; > setup(); > printf("Bar: "); > for (i=0; i<20; i++) > printf("%d ", S.bar[i]); > printf("\nBaz: "); > for (i=0; i<20; i++) > printf("%d ", S.baz[i]); > puts(""); > > return 0; > } > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >
On 8 January 2013 16:53, David Blaikie <dblaikie at gmail.com> wrote:> I'm not sure what you mean by "fix user's stupidity" here - could you > clarify? >Buffer overrun on foo[20] and relying on it for bar[20]. It might not even be an error to access foo[50] even though foo only has 20 elements (via pointer indirection rules), but it's user error to do so, and if the standard allows that (I'm yet to find the paragraph), then the compiler has no right to "fix" it. If it's undefined, than LTO is completely right and nothing should be done. The "stupidity" part is to rely on undefined behaviour. Mind you, the stupidity in this case was mine. I removed functions from Livermore that I though were harmless, and added a few arrays to be initialized by others and haven't checked that the ranges were dynamic. cheers, --renato -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130108/85eec297/attachment.html>