David Blaikie via llvm-dev
2016-Mar-21 23:46 UTC
[llvm-dev] Need help with code generation
On Mon, Mar 21, 2016 at 4:42 PM, Rui Ueyama <ruiu at google.com> wrote:> On Tue, Mar 22, 2016 at 12:32 AM, David Blaikie <dblaikie at gmail.com> > wrote: > >> >> >> On Mon, Mar 21, 2016 at 4:21 PM, Rui Ueyama <ruiu at google.com> wrote: >> >>> From the user's point of view, I think it's still the same. As long as >>> LLVM is guaranteed to be undefined behavior-free (including any unknown >>> bugs), users are not guaranteed from getting undefined outputs. (And please >>> keep it in mind that we are talking about rare cases such as you created >>> ELF files by your own by hand or with a buggy tool.) >>> >> >> The same is true of any software (all software has bugs) - including the >> software outside that would be forking a subprocess to run lld, no? >> >> With LLVM we consider these bugs and fix them (or at least pretty much >> without question accept patches to fix them at least). It seems like the >> bar for getting such a patch into LLD is being set much higher - this seems >> problematic to me at least. >> >> A library doesn't have to be guaranteed to be free of bugs to be a >> library - that seems like an unrealistic standard (& one not present in any >> other project that I know of) >> >> In any case, I'm talking about just LLD itself when I'm expressing >> concern about "not a bug UB". It seems very different for the user of lld >> at the command line between "this program will give a short error and >> exit(1)" and "this program has known/intended undefined behavior". Even on >> uncommon inputs. >> > > As long as you can't prove that a program has no UB bug, you cannot say > that "this program has no undefined behavior." From the user's point of > view, it is still UB even if it is known to developers and fixed in earlier > version. >Then pretty much all software and all libraries do not meet the bar you are describing - so do so many try to fix these bugs? And if such a program or library is willing to say "we'll fix bugs if we find them" and wants to use lld - wouldn't it be reasonable to support them? Since that's pretty much the bar to which most most software is developed.> > Again, I'd like to emphasize that we are talking about ill-formed ELF > header or something. If you are intentionally trying to break the linker, > I'd say "don't do that." As long as your input is not corrupted in terms of > file formatting, LLD behaves definitely (as far as we can guarantee.) >Right - all we're asking for is the same guarantee (not a very strong guarantee - you haven't provide it's defined for all valid inputs, I'm sure (formal proofs are really expensive, and buggy, and even fuzzing just helps it doesn't guarantee)) for other inputs - in both cases we know it's not an iron clad guarantee/proven truth.> > >> - David >> >> >>> >>> On Tue, Mar 22, 2016 at 12:02 AM, David Blaikie <dblaikie at gmail.com> >>> wrote: >>> >>>> >>>> >>>> On Mon, Mar 21, 2016 at 2:54 PM, Rui Ueyama <ruiu at google.com> wrote: >>>> >>>>> On Mon, Mar 21, 2016 at 10:49 PM, David Blaikie via llvm-dev < >>>>> llvm-dev at lists.llvm.org> wrote: >>>>> >>>>>> >>>>>> >>>>>> On Mon, Mar 21, 2016 at 2:46 PM, Rafael Espíndola < >>>>>> llvm-dev at lists.llvm.org> wrote: >>>>>> >>>>>>> On 21 March 2016 at 17:34, Tim Northover via llvm-dev >>>>>>> <llvm-dev at lists.llvm.org> wrote: >>>>>>> >> My understanding is that clang and llvm themselves are designed >>>>>>> this way >>>>>>> >> (crash when the unexpected happens). >>>>>>> > >>>>>>> > I don't think so. I'd view any Clang crash as a bug (probably to be >>>>>>> > prioritised below silent CodeGen and many others, but not "working >>>>>>> as >>>>>>> > designed"). >>>>>>> > >>>>>>> >> For example the fact that clang forks itself to be able to report >>>>>>> diagnostics >>>>>>> > >>>>>>> > That seems like just trying to make our own job easier to me. I >>>>>>> think >>>>>>> > the entire point of the fork is to get a backtrace we can fix, and >>>>>>> > point out where the user should send it. >>>>>>> > >>>>>>> >> llvm is full of report_fatal_error() (or worse, assertions that >>>>>>> can fire on unexpected user input). >>>>>>> > >>>>>>> > A bit of a grey area since LLVM isn't itself a user-facing tool, >>>>>>> but I >>>>>>> > think I'd still say that a report_fatal_error that's not >>>>>>> actionable by >>>>>>> > the user is actually an LLVM bug. And a segfault definitely so. >>>>>>> >>>>>>> It is completely trivial to crash llvm. A case I wrote today in >>>>>>> another thread while waiting for tests to run: >>>>>>> >>>>>>> target triple = "x86_64-unknown-linux-gnu" >>>>>>> @".data" = global i32 42 >>>>>>> >>>>>>> That will crash "llc -filetype=obj". The fact that it is considered a >>>>>>> bug doesn't mean much if there is no coordinated effort to fix them. >>>>>>> >>>>>> >>>>>> I think it does, actually - that patches will be accepted to fix >>>>>> pretty much any crash in LLVM. (llc isn't a user facing tool, so that's a >>>>>> praticularly low priority - but as a general library (I assume your example >>>>>> also crashes Clang, which would be where this would surface in a more >>>>>> important way) it's pretty well accepted that crashes are bugs, I think) >>>>>> >>>>>> >>>>>>> Right now lld is already harder to crash than llvm. We are just being >>>>>>> honest about the fact that it is possible to craft a .o file that >>>>>>> will >>>>>>> crash it. >>>>>>> >>>>>> >>>>>> But the difference seems to be you know about these cases and don't >>>>>> consider them to be bugs/anything to fix. In LLVM if they're known, they're >>>>>> at least considered bugs and often/usually considered by someone to be >>>>>> worth fixing at some point. >>>>>> >>>>> >>>>> I think this is the same from the user's point of view. If LLVM is not >>>>> crash-bug-free in the version you are using, you need some precaution such >>>>> as forking in order to protect your program from crashing if you need 100% >>>>> guarantee. >>>>> >>>> >>>> Crashes seem very different from a user's point of view - does the >>>> program execute undefined behavior (potentially silently producing output >>>> and exiting 0) or does it have well defined behavior (even if that behavior >>>> is "print an error and exit(1)"). >>>> >>>> >>>>> >>>>> >>>>>> - Dave >>>>>> >>>>>> >>>>>>> >>>>>>> Cheers, >>>>>>> Rafael >>>>>>> _______________________________________________ >>>>>>> LLVM Developers mailing list >>>>>>> llvm-dev at lists.llvm.org >>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> LLVM Developers mailing list >>>>>> llvm-dev at lists.llvm.org >>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>>>>> >>>>>> >>>>> >>>> >>> >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160321/2d7dfc5f/attachment.html>
On Tue, Mar 22, 2016 at 12:46 AM, David Blaikie <dblaikie at gmail.com> wrote:> > > On Mon, Mar 21, 2016 at 4:42 PM, Rui Ueyama <ruiu at google.com> wrote: > >> On Tue, Mar 22, 2016 at 12:32 AM, David Blaikie <dblaikie at gmail.com> >> wrote: >> >>> >>> >>> On Mon, Mar 21, 2016 at 4:21 PM, Rui Ueyama <ruiu at google.com> wrote: >>> >>>> From the user's point of view, I think it's still the same. As long as >>>> LLVM is guaranteed to be undefined behavior-free (including any unknown >>>> bugs), users are not guaranteed from getting undefined outputs. (And please >>>> keep it in mind that we are talking about rare cases such as you created >>>> ELF files by your own by hand or with a buggy tool.) >>>> >>> >>> The same is true of any software (all software has bugs) - including the >>> software outside that would be forking a subprocess to run lld, no? >>> >>> With LLVM we consider these bugs and fix them (or at least pretty much >>> without question accept patches to fix them at least). It seems like the >>> bar for getting such a patch into LLD is being set much higher - this seems >>> problematic to me at least. >>> >>> A library doesn't have to be guaranteed to be free of bugs to be a >>> library - that seems like an unrealistic standard (& one not present in any >>> other project that I know of) >>> >>> In any case, I'm talking about just LLD itself when I'm expressing >>> concern about "not a bug UB". It seems very different for the user of lld >>> at the command line between "this program will give a short error and >>> exit(1)" and "this program has known/intended undefined behavior". Even on >>> uncommon inputs. >>> >> >> As long as you can't prove that a program has no UB bug, you cannot say >> that "this program has no undefined behavior." From the user's point of >> view, it is still UB even if it is known to developers and fixed in earlier >> version. >> > > Then pretty much all software and all libraries do not meet the bar you > are describing - so do so many try to fix these bugs? And if such a program > or library is willing to say "we'll fix bugs if we find them" and wants to > use lld - wouldn't it be reasonable to support them? Since that's pretty > much the bar to which most most software is developed. >A kernel is allowed (and choose to) crash with panic() if a device behaves weirdly. LLVM's pass can crash if the previous pass is buggy. Many regexp engines goes into virtually infinite loops if you give malicious regexp. And any program can do anything weird if there is a bug. What we can do is set a boundary and make best effort to guarantee that as long as you are within the boundary, we handle any input in some reasonable way. This is what we do -- and other programs do. And where the boundary should be set depends on program.> > >> >> Again, I'd like to emphasize that we are talking about ill-formed ELF >> header or something. If you are intentionally trying to break the linker, >> I'd say "don't do that." As long as your input is not corrupted in terms of >> file formatting, LLD behaves definitely (as far as we can guarantee.) >> > > Right - all we're asking for is the same guarantee (not a very strong > guarantee - you haven't provide it's defined for all valid inputs, I'm sure > (formal proofs are really expensive, and buggy, and even fuzzing just helps > it doesn't guarantee)) for other inputs - in both cases we know it's not an > iron clad guarantee/proven truth. > > >> >> >>> - David >>> >>> >>>> >>>> On Tue, Mar 22, 2016 at 12:02 AM, David Blaikie <dblaikie at gmail.com> >>>> wrote: >>>> >>>>> >>>>> >>>>> On Mon, Mar 21, 2016 at 2:54 PM, Rui Ueyama <ruiu at google.com> wrote: >>>>> >>>>>> On Mon, Mar 21, 2016 at 10:49 PM, David Blaikie via llvm-dev < >>>>>> llvm-dev at lists.llvm.org> wrote: >>>>>> >>>>>>> >>>>>>> >>>>>>> On Mon, Mar 21, 2016 at 2:46 PM, Rafael Espíndola < >>>>>>> llvm-dev at lists.llvm.org> wrote: >>>>>>> >>>>>>>> On 21 March 2016 at 17:34, Tim Northover via llvm-dev >>>>>>>> <llvm-dev at lists.llvm.org> wrote: >>>>>>>> >> My understanding is that clang and llvm themselves are designed >>>>>>>> this way >>>>>>>> >> (crash when the unexpected happens). >>>>>>>> > >>>>>>>> > I don't think so. I'd view any Clang crash as a bug (probably to >>>>>>>> be >>>>>>>> > prioritised below silent CodeGen and many others, but not >>>>>>>> "working as >>>>>>>> > designed"). >>>>>>>> > >>>>>>>> >> For example the fact that clang forks itself to be able to >>>>>>>> report diagnostics >>>>>>>> > >>>>>>>> > That seems like just trying to make our own job easier to me. I >>>>>>>> think >>>>>>>> > the entire point of the fork is to get a backtrace we can fix, and >>>>>>>> > point out where the user should send it. >>>>>>>> > >>>>>>>> >> llvm is full of report_fatal_error() (or worse, assertions that >>>>>>>> can fire on unexpected user input). >>>>>>>> > >>>>>>>> > A bit of a grey area since LLVM isn't itself a user-facing tool, >>>>>>>> but I >>>>>>>> > think I'd still say that a report_fatal_error that's not >>>>>>>> actionable by >>>>>>>> > the user is actually an LLVM bug. And a segfault definitely so. >>>>>>>> >>>>>>>> It is completely trivial to crash llvm. A case I wrote today in >>>>>>>> another thread while waiting for tests to run: >>>>>>>> >>>>>>>> target triple = "x86_64-unknown-linux-gnu" >>>>>>>> @".data" = global i32 42 >>>>>>>> >>>>>>>> That will crash "llc -filetype=obj". The fact that it is considered >>>>>>>> a >>>>>>>> bug doesn't mean much if there is no coordinated effort to fix them. >>>>>>>> >>>>>>> >>>>>>> I think it does, actually - that patches will be accepted to fix >>>>>>> pretty much any crash in LLVM. (llc isn't a user facing tool, so that's a >>>>>>> praticularly low priority - but as a general library (I assume your example >>>>>>> also crashes Clang, which would be where this would surface in a more >>>>>>> important way) it's pretty well accepted that crashes are bugs, I think) >>>>>>> >>>>>>> >>>>>>>> Right now lld is already harder to crash than llvm. We are just >>>>>>>> being >>>>>>>> honest about the fact that it is possible to craft a .o file that >>>>>>>> will >>>>>>>> crash it. >>>>>>>> >>>>>>> >>>>>>> But the difference seems to be you know about these cases and don't >>>>>>> consider them to be bugs/anything to fix. In LLVM if they're known, they're >>>>>>> at least considered bugs and often/usually considered by someone to be >>>>>>> worth fixing at some point. >>>>>>> >>>>>> >>>>>> I think this is the same from the user's point of view. If LLVM is >>>>>> not crash-bug-free in the version you are using, you need some precaution >>>>>> such as forking in order to protect your program from crashing if you need >>>>>> 100% guarantee. >>>>>> >>>>> >>>>> Crashes seem very different from a user's point of view - does the >>>>> program execute undefined behavior (potentially silently producing output >>>>> and exiting 0) or does it have well defined behavior (even if that behavior >>>>> is "print an error and exit(1)"). >>>>> >>>>> >>>>>> >>>>>> >>>>>>> - Dave >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> Cheers, >>>>>>>> Rafael >>>>>>>> _______________________________________________ >>>>>>>> LLVM Developers mailing list >>>>>>>> llvm-dev at lists.llvm.org >>>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>>>>>>> >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> LLVM Developers mailing list >>>>>>> llvm-dev at lists.llvm.org >>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160322/86984ab8/attachment.html>
Hi Rui, LLVM's pass can crash if the previous pass is buggy. That's a bug that should be fixed in the previous pass. What we can do is set a boundary and make best effort to guarantee that as> long as you are within the boundary, we handle any input in some reasonable > way.That boundary is usually user input. We assume that the program's memory hasn't been compromised, but anything the user puts in should be treated with suspicion. Would you use a browser that didn't check for buffer overruns? Part of the problem is that we assume the linker is being used in a context where the input can be trusted. A lot of the time that's true, but assuming it limits the contexts in which LLD could be used. For example, you couldn't use LLD as the linker in a build-farm if it crashed on malformed input - what's to stop someone uploading a malformed ELF file and tricking the linker into sniffing other projects being built on the same server? Cheers, Lang. On Mon, Mar 21, 2016 at 4:56 PM, Rui Ueyama via llvm-dev < llvm-dev at lists.llvm.org> wrote:> On Tue, Mar 22, 2016 at 12:46 AM, David Blaikie <dblaikie at gmail.com> > wrote: > >> >> >> On Mon, Mar 21, 2016 at 4:42 PM, Rui Ueyama <ruiu at google.com> wrote: >> >>> On Tue, Mar 22, 2016 at 12:32 AM, David Blaikie <dblaikie at gmail.com> >>> wrote: >>> >>>> >>>> >>>> On Mon, Mar 21, 2016 at 4:21 PM, Rui Ueyama <ruiu at google.com> wrote: >>>> >>>>> From the user's point of view, I think it's still the same. As long as >>>>> LLVM is guaranteed to be undefined behavior-free (including any unknown >>>>> bugs), users are not guaranteed from getting undefined outputs. (And please >>>>> keep it in mind that we are talking about rare cases such as you created >>>>> ELF files by your own by hand or with a buggy tool.) >>>>> >>>> >>>> The same is true of any software (all software has bugs) - including >>>> the software outside that would be forking a subprocess to run lld, no? >>>> >>>> With LLVM we consider these bugs and fix them (or at least pretty much >>>> without question accept patches to fix them at least). It seems like the >>>> bar for getting such a patch into LLD is being set much higher - this seems >>>> problematic to me at least. >>>> >>>> A library doesn't have to be guaranteed to be free of bugs to be a >>>> library - that seems like an unrealistic standard (& one not present in any >>>> other project that I know of) >>>> >>>> In any case, I'm talking about just LLD itself when I'm expressing >>>> concern about "not a bug UB". It seems very different for the user of lld >>>> at the command line between "this program will give a short error and >>>> exit(1)" and "this program has known/intended undefined behavior". Even on >>>> uncommon inputs. >>>> >>> >>> As long as you can't prove that a program has no UB bug, you cannot say >>> that "this program has no undefined behavior." From the user's point of >>> view, it is still UB even if it is known to developers and fixed in earlier >>> version. >>> >> >> Then pretty much all software and all libraries do not meet the bar you >> are describing - so do so many try to fix these bugs? And if such a program >> or library is willing to say "we'll fix bugs if we find them" and wants to >> use lld - wouldn't it be reasonable to support them? Since that's pretty >> much the bar to which most most software is developed. >> > > A kernel is allowed (and choose to) crash with panic() if a device behaves > weirdly. LLVM's pass can crash if the previous pass is buggy. Many regexp > engines goes into virtually infinite loops if you give malicious regexp. > And any program can do anything weird if there is a bug. What we can do is > set a boundary and make best effort to guarantee that as long as you are > within the boundary, we handle any input in some reasonable way. This is > what we do -- and other programs do. And where the boundary should be set > depends on program. > > >> >> >>> >>> Again, I'd like to emphasize that we are talking about ill-formed ELF >>> header or something. If you are intentionally trying to break the linker, >>> I'd say "don't do that." As long as your input is not corrupted in terms of >>> file formatting, LLD behaves definitely (as far as we can guarantee.) >>> >> >> Right - all we're asking for is the same guarantee (not a very strong >> guarantee - you haven't provide it's defined for all valid inputs, I'm sure >> (formal proofs are really expensive, and buggy, and even fuzzing just helps >> it doesn't guarantee)) for other inputs - in both cases we know it's not an >> iron clad guarantee/proven truth. >> >> >>> >>> >>>> - David >>>> >>>> >>>>> >>>>> On Tue, Mar 22, 2016 at 12:02 AM, David Blaikie <dblaikie at gmail.com> >>>>> wrote: >>>>> >>>>>> >>>>>> >>>>>> On Mon, Mar 21, 2016 at 2:54 PM, Rui Ueyama <ruiu at google.com> wrote: >>>>>> >>>>>>> On Mon, Mar 21, 2016 at 10:49 PM, David Blaikie via llvm-dev < >>>>>>> llvm-dev at lists.llvm.org> wrote: >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Mon, Mar 21, 2016 at 2:46 PM, Rafael Espíndola < >>>>>>>> llvm-dev at lists.llvm.org> wrote: >>>>>>>> >>>>>>>>> On 21 March 2016 at 17:34, Tim Northover via llvm-dev >>>>>>>>> <llvm-dev at lists.llvm.org> wrote: >>>>>>>>> >> My understanding is that clang and llvm themselves are designed >>>>>>>>> this way >>>>>>>>> >> (crash when the unexpected happens). >>>>>>>>> > >>>>>>>>> > I don't think so. I'd view any Clang crash as a bug (probably to >>>>>>>>> be >>>>>>>>> > prioritised below silent CodeGen and many others, but not >>>>>>>>> "working as >>>>>>>>> > designed"). >>>>>>>>> > >>>>>>>>> >> For example the fact that clang forks itself to be able to >>>>>>>>> report diagnostics >>>>>>>>> > >>>>>>>>> > That seems like just trying to make our own job easier to me. I >>>>>>>>> think >>>>>>>>> > the entire point of the fork is to get a backtrace we can fix, >>>>>>>>> and >>>>>>>>> > point out where the user should send it. >>>>>>>>> > >>>>>>>>> >> llvm is full of report_fatal_error() (or worse, assertions that >>>>>>>>> can fire on unexpected user input). >>>>>>>>> > >>>>>>>>> > A bit of a grey area since LLVM isn't itself a user-facing tool, >>>>>>>>> but I >>>>>>>>> > think I'd still say that a report_fatal_error that's not >>>>>>>>> actionable by >>>>>>>>> > the user is actually an LLVM bug. And a segfault definitely so. >>>>>>>>> >>>>>>>>> It is completely trivial to crash llvm. A case I wrote today in >>>>>>>>> another thread while waiting for tests to run: >>>>>>>>> >>>>>>>>> target triple = "x86_64-unknown-linux-gnu" >>>>>>>>> @".data" = global i32 42 >>>>>>>>> >>>>>>>>> That will crash "llc -filetype=obj". The fact that it is >>>>>>>>> considered a >>>>>>>>> bug doesn't mean much if there is no coordinated effort to fix >>>>>>>>> them. >>>>>>>>> >>>>>>>> >>>>>>>> I think it does, actually - that patches will be accepted to fix >>>>>>>> pretty much any crash in LLVM. (llc isn't a user facing tool, so that's a >>>>>>>> praticularly low priority - but as a general library (I assume your example >>>>>>>> also crashes Clang, which would be where this would surface in a more >>>>>>>> important way) it's pretty well accepted that crashes are bugs, I think) >>>>>>>> >>>>>>>> >>>>>>>>> Right now lld is already harder to crash than llvm. We are just >>>>>>>>> being >>>>>>>>> honest about the fact that it is possible to craft a .o file that >>>>>>>>> will >>>>>>>>> crash it. >>>>>>>>> >>>>>>>> >>>>>>>> But the difference seems to be you know about these cases and don't >>>>>>>> consider them to be bugs/anything to fix. In LLVM if they're known, they're >>>>>>>> at least considered bugs and often/usually considered by someone to be >>>>>>>> worth fixing at some point. >>>>>>>> >>>>>>> >>>>>>> I think this is the same from the user's point of view. If LLVM is >>>>>>> not crash-bug-free in the version you are using, you need some precaution >>>>>>> such as forking in order to protect your program from crashing if you need >>>>>>> 100% guarantee. >>>>>>> >>>>>> >>>>>> Crashes seem very different from a user's point of view - does the >>>>>> program execute undefined behavior (potentially silently producing output >>>>>> and exiting 0) or does it have well defined behavior (even if that behavior >>>>>> is "print an error and exit(1)"). >>>>>> >>>>>> >>>>>>> >>>>>>> >>>>>>>> - Dave >>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> Cheers, >>>>>>>>> Rafael >>>>>>>>> _______________________________________________ >>>>>>>>> LLVM Developers mailing list >>>>>>>>> llvm-dev at lists.llvm.org >>>>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> LLVM Developers mailing list >>>>>>>> llvm-dev at lists.llvm.org >>>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160321/95e748fc/attachment-0001.html>