Mikael Lyngvig
2013-Nov-27 02:58 UTC
[LLVMdev] Bug in Language Reference? %0 versus %1 as starting index.
Thanks for the lecture :) But I was not planning on changing a single line in LLVM/Clang. I stick to the documentation until I've learned to swim, perhaps even forever. Ah, now I see. You thought I meant "should I modify the code to do this or that." I only meant to change the documentation. Please refer to the patch I've sent on LLVM-commits. That's about what I had in mind. I am fully aware that you cannot simply dive in and hack away on the handling of the %0 temporary. I wouldn't ever dream of doing that! -- Mikael 2013/11/27 Sean Silva <chisophugis at gmail.com>> (gah, this turned into a huge digression, sorry) > > The implicit numbering of BB's seems to be a pretty frequent issue for > people. Surprisingly, the issue boils down to simply changing the IR asm > (.ll file) syntax so that it can have "unnamed BB's" in a recognizable way > that fits in with how unnamed values work (the asmprinter makes an effort > to print a comment with the BB number, but the connection is hard to see > and it's confusing). > > The thing that makes this not-as-easy-as-it-looks is doing it in a way > that preserves compatibility with previous IR (and being able to convince > yourself that this is the case), and the fact that the IR-parsing code is a > bit twisty (it's not bad, but the way that some things work is subtly > different from what you would expect) and you have to find something that > "fits well" with what's there, doesn't require major reworking of the > existing code, etc. > > An alternative approach is to document very clearly this issue. That might > be good in the short term, but IMO the time would be better spent > ruminating over a way to fit this into the syntax, and thinking > deeply/finding a way to convince yourself and others that this change > doesn't break previous .ll files. > > It's just about thinking and coming up with a new syntax that fits well > and that won't break existing .ll files. The key places for making this > round-trip are AssemblyWriter::printBasicBlock in lib/IR/AsmWriter.cpp > and LLParser::ParseBasicBlock in lib/AsmParser/LLParser.cpp. The parsing > side is likely to be entirely in lib/AsmParser/LLLexer.cpp where you need > to find a way to get a new token "LocalLabelID" returned for the new syntax. > > To reiterate, the goal of such a change is solely to avoid people getting > confused about the implicit numbering. It needs to be > reminiscent/suggestive of the instruction numbering syntax to avoid this > confusion. > > Heck, there may be something within the existing syntax that would work > fine for this, but which we can recognize as being "unnamed", rather than a > unique name e.g. currently $1: will give the BB a name "$1" (in the sense > of getName()), and then "$2:" will give a name "$2", etc., which will cause > a lot of pointless string allocations; recognizing a decimal number here > might be all that's needed (and updating the outputting code accordingly), > although I'm not sure a prefix $ is the best syntax. > > Maybe we could even get away with %42: as a BB label and that would be > maximally reminiscent. The way that numbered local variables are handled is > sort of ad-hoc (it is actually also handled in the Lexer; all the parser > sees is lltok::LocalVarID). By just changing LLLexer::LexPercent in > LLLexer.cpp to recognize a local label and emit a "LocalLabelID" token, > then adding an `else if` to the first `if` in LLParser::ParseBasicBlock, > you could probably get a working solution too. However, this introduces an > inconsistency in that now there's this pseudo-common syntax (%[0-9]+) for > unnamed things for both BB's and instructions, but in the case of > instructions, the % sigil is always needed, while the label syntax isn't > sigilized by default, but permits this weird sigilized temporary numbered > form. Maybe that slight inconsistency is worth it? If the inconsistency is > really bothersome, we could also have BB's be able to start sigilized with > % in the other case like instructions are (there is no ambiguity because of > the trailing `:`), but allow the unsigilized versions for compatibility; > this may be more consistent from a semantic perspective too, since we refer > to them sigilized when used as instruction operands. > > Or maybe you could have the BB be numbered just like `42:` without the > sigil. We already lex a label like 42:, but we just have the issue that I > mentioned with $1: that we set this string as the getName() value which > creates a bunch of useless strings. If you just change the code to emit a > "LocalLabelID" for this case and imitate how we handle locally numbered > instructions, that could be a pretty clean fix. However, that would change > the behavior for how we handle a label like `0:`, for example, with this > behavior, the following IR asm would work: > > define void @foo() { > 0: > %1 = alloca i8* > ret void > } > > but since with our current behavior we handle `0:` as a BB name and set > it's getName() as "0", which causes it to not take up the first unnamed > value slot (the %0'th one), so then you get an error that %1 should be %0. > This may be an annoying forwards-compatibility issue for a while when we > still have to work with not-trunk LLVM's, and this incompatibility may not > be worth it. Actually all the suggestions that I've made so far have this > same issue :/ Actually I think that it is unsolvable without a > forwards-compatibility break due to this (any label that was previously > accepted would not increment the unnamed local counter, which would cause > all the existing unnamed locals to be off by one and cause an error). We do > break forward-compatibility from time to time (e.g. the syntax for the new > attributes system), so it might not be that big of an issue (although > obviously the community will have to decide about the trade-off for a > temporary nuisance vs. the issue this solves). If breaking > forwards-compatibility is OK, then I would strongly suggest the `0:` syntax > or `%0:`. > > Hopefully I've given you a bit of the flavor of the issues involved. It's > basically just a problem of sitting down and thinking hard, finding > something cleanly-implementable that doesn't break backwards compatibility, > and checking with the community that the syntax is agreeable and that any > forwards-compatibility break is ok. > > -- Sean Silva > > > On Tue, Nov 26, 2013 at 8:02 PM, Mikael Lyngvig <mikael at lyngvig.org>wrote: > >> The language reference states that local temporaries begin with index 0, >> but if I try that on my not-entirely-up-to-date v3.4 llc (it is like a week >> old), I get an error "instruction expected to be numbered '%1'". >> >> Also, quite a few examples in the LR uses %0 as a local identifier. >> >> Should I fix those or is it a problem in llc? >> >> >> -- Mikael >> >> >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20131127/e2abe391/attachment.html>
Sean Silva
2013-Nov-27 03:24 UTC
[LLVMdev] Bug in Language Reference? %0 versus %1 as starting index.
On Tue, Nov 26, 2013 at 9:58 PM, Mikael Lyngvig <mikael at lyngvig.org> wrote:> Thanks for the lecture :) But I was not planning on changing a single > line in LLVM/Clang. I stick to the documentation until I've learned to > swim, perhaps even forever. Ah, now I see. You thought I meant "should I > modify the code to do this or that." I only meant to change the > documentation. Please refer to the patch I've sent on LLVM-commits. > That's about what I had in mind. I am fully aware that you cannot simply > dive in and hack away on the handling of the %0 temporary. I wouldn't ever > dream of doing that! >You should dream of doing that. Nobody else has stepped up to do it. Hack on the code; ultimately that's where the action is and where you will gain understanding. (And I'm probably the worst person to give this advice since I do so little code hacking during the school year. I swear, I really do prefer coding; when I'm at work with a nice fast machine it's a lot nicer to hack, but at school with a crappy machine, the situation usually only permits reviewing patches on the mailing lists or docs changes.) AFAIK nobody is an "expert" in that code (its probably long out of core for even the people that wrote it); if you dive into it, you can become a local expert in it. -- Sean Silva> > > -- Mikael > > > > > 2013/11/27 Sean Silva <chisophugis at gmail.com> > >> (gah, this turned into a huge digression, sorry) >> >> The implicit numbering of BB's seems to be a pretty frequent issue for >> people. Surprisingly, the issue boils down to simply changing the IR asm >> (.ll file) syntax so that it can have "unnamed BB's" in a recognizable way >> that fits in with how unnamed values work (the asmprinter makes an effort >> to print a comment with the BB number, but the connection is hard to see >> and it's confusing). >> >> The thing that makes this not-as-easy-as-it-looks is doing it in a way >> that preserves compatibility with previous IR (and being able to convince >> yourself that this is the case), and the fact that the IR-parsing code is a >> bit twisty (it's not bad, but the way that some things work is subtly >> different from what you would expect) and you have to find something that >> "fits well" with what's there, doesn't require major reworking of the >> existing code, etc. >> >> An alternative approach is to document very clearly this issue. That >> might be good in the short term, but IMO the time would be better spent >> ruminating over a way to fit this into the syntax, and thinking >> deeply/finding a way to convince yourself and others that this change >> doesn't break previous .ll files. >> >> It's just about thinking and coming up with a new syntax that fits well >> and that won't break existing .ll files. The key places for making this >> round-trip are AssemblyWriter::printBasicBlock in lib/IR/AsmWriter.cpp >> and LLParser::ParseBasicBlock in lib/AsmParser/LLParser.cpp. The parsing >> side is likely to be entirely in lib/AsmParser/LLLexer.cpp where you need >> to find a way to get a new token "LocalLabelID" returned for the new syntax. >> >> To reiterate, the goal of such a change is solely to avoid people getting >> confused about the implicit numbering. It needs to be >> reminiscent/suggestive of the instruction numbering syntax to avoid this >> confusion. >> >> Heck, there may be something within the existing syntax that would work >> fine for this, but which we can recognize as being "unnamed", rather than a >> unique name e.g. currently $1: will give the BB a name "$1" (in the sense >> of getName()), and then "$2:" will give a name "$2", etc., which will cause >> a lot of pointless string allocations; recognizing a decimal number here >> might be all that's needed (and updating the outputting code accordingly), >> although I'm not sure a prefix $ is the best syntax. >> >> Maybe we could even get away with %42: as a BB label and that would be >> maximally reminiscent. The way that numbered local variables are handled is >> sort of ad-hoc (it is actually also handled in the Lexer; all the parser >> sees is lltok::LocalVarID). By just changing LLLexer::LexPercent in >> LLLexer.cpp to recognize a local label and emit a "LocalLabelID" token, >> then adding an `else if` to the first `if` in LLParser::ParseBasicBlock, >> you could probably get a working solution too. However, this introduces an >> inconsistency in that now there's this pseudo-common syntax (%[0-9]+) for >> unnamed things for both BB's and instructions, but in the case of >> instructions, the % sigil is always needed, while the label syntax isn't >> sigilized by default, but permits this weird sigilized temporary numbered >> form. Maybe that slight inconsistency is worth it? If the inconsistency is >> really bothersome, we could also have BB's be able to start sigilized with >> % in the other case like instructions are (there is no ambiguity because of >> the trailing `:`), but allow the unsigilized versions for compatibility; >> this may be more consistent from a semantic perspective too, since we refer >> to them sigilized when used as instruction operands. >> >> Or maybe you could have the BB be numbered just like `42:` without the >> sigil. We already lex a label like 42:, but we just have the issue that I >> mentioned with $1: that we set this string as the getName() value which >> creates a bunch of useless strings. If you just change the code to emit a >> "LocalLabelID" for this case and imitate how we handle locally numbered >> instructions, that could be a pretty clean fix. However, that would change >> the behavior for how we handle a label like `0:`, for example, with this >> behavior, the following IR asm would work: >> >> define void @foo() { >> 0: >> %1 = alloca i8* >> ret void >> } >> >> but since with our current behavior we handle `0:` as a BB name and set >> it's getName() as "0", which causes it to not take up the first unnamed >> value slot (the %0'th one), so then you get an error that %1 should be %0. >> This may be an annoying forwards-compatibility issue for a while when we >> still have to work with not-trunk LLVM's, and this incompatibility may not >> be worth it. Actually all the suggestions that I've made so far have this >> same issue :/ Actually I think that it is unsolvable without a >> forwards-compatibility break due to this (any label that was previously >> accepted would not increment the unnamed local counter, which would cause >> all the existing unnamed locals to be off by one and cause an error). We do >> break forward-compatibility from time to time (e.g. the syntax for the new >> attributes system), so it might not be that big of an issue (although >> obviously the community will have to decide about the trade-off for a >> temporary nuisance vs. the issue this solves). If breaking >> forwards-compatibility is OK, then I would strongly suggest the `0:` syntax >> or `%0:`. >> >> Hopefully I've given you a bit of the flavor of the issues involved. It's >> basically just a problem of sitting down and thinking hard, finding >> something cleanly-implementable that doesn't break backwards compatibility, >> and checking with the community that the syntax is agreeable and that any >> forwards-compatibility break is ok. >> >> -- Sean Silva >> >> >> On Tue, Nov 26, 2013 at 8:02 PM, Mikael Lyngvig <mikael at lyngvig.org>wrote: >> >>> The language reference states that local temporaries begin with index 0, >>> but if I try that on my not-entirely-up-to-date v3.4 llc (it is like a week >>> old), I get an error "instruction expected to be numbered '%1'". >>> >>> Also, quite a few examples in the LR uses %0 as a local identifier. >>> >>> Should I fix those or is it a problem in llc? >>> >>> >>> -- Mikael >>> >>> >>> _______________________________________________ >>> LLVM Developers mailing list >>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>> >>> >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20131126/be53b1f6/attachment.html>
Mikael Lyngvig
2013-Nov-27 03:35 UTC
[LLVMdev] Bug in Language Reference? %0 versus %1 as starting index.
Without ANY intent of offending anybody, I simply don't like C++. I did code in it for some 12 years back from 1990 to 2002, but then I left it behind with a feeling of happiness. The main reason I am _trying_ to make a new language is that I hope to one day come up with something that can help retiring C++. I love C# but that language is yet too slow for many demanding problem domains. That being said, I don't seriously believe I'll ever finish up my own language, but as long as I am having a good time along the way, I don't mind. Now I spend the majority of my spare time on LLVM documentation (most of it still pending submission because of various factors). Once the dust settles from all the documentation projects I've started on (Arch Linux build doc, Debian build doc, Windows build doc, Mapping High-Level Constructs to LLVM IR), I plan to resume work on my own language, which will be something like Python-syntax C# without .NET and perhaps with optional garbage collection. Perhaps I'll some day gather up the courage to pick an easy bug report and fix that, but it is not very likely that I ever become a core coder on LLVM. -- Mikael 2013/11/27 Sean Silva <chisophugis at gmail.com>> > > > On Tue, Nov 26, 2013 at 9:58 PM, Mikael Lyngvig <mikael at lyngvig.org>wrote: > >> Thanks for the lecture :) But I was not planning on changing a single >> line in LLVM/Clang. I stick to the documentation until I've learned to >> swim, perhaps even forever. Ah, now I see. You thought I meant "should I >> modify the code to do this or that." I only meant to change the >> documentation. Please refer to the patch I've sent on LLVM-commits. >> That's about what I had in mind. I am fully aware that you cannot simply >> dive in and hack away on the handling of the %0 temporary. I wouldn't ever >> dream of doing that! >> > > You should dream of doing that. Nobody else has stepped up to do it. Hack > on the code; ultimately that's where the action is and where you will gain > understanding. > (And I'm probably the worst person to give this advice since I do so > little code hacking during the school year. I swear, I really do prefer > coding; when I'm at work with a nice fast machine it's a lot nicer to hack, > but at school with a crappy machine, the situation usually only permits > reviewing patches on the mailing lists or docs changes.) > > AFAIK nobody is an "expert" in that code (its probably long out of core > for even the people that wrote it); if you dive into it, you can become a > local expert in it. > > -- Sean Silva > > >> >> >> -- Mikael >> >> >> >> >> 2013/11/27 Sean Silva <chisophugis at gmail.com> >> >>> (gah, this turned into a huge digression, sorry) >>> >>> The implicit numbering of BB's seems to be a pretty frequent issue for >>> people. Surprisingly, the issue boils down to simply changing the IR asm >>> (.ll file) syntax so that it can have "unnamed BB's" in a recognizable way >>> that fits in with how unnamed values work (the asmprinter makes an effort >>> to print a comment with the BB number, but the connection is hard to see >>> and it's confusing). >>> >>> The thing that makes this not-as-easy-as-it-looks is doing it in a way >>> that preserves compatibility with previous IR (and being able to convince >>> yourself that this is the case), and the fact that the IR-parsing code is a >>> bit twisty (it's not bad, but the way that some things work is subtly >>> different from what you would expect) and you have to find something that >>> "fits well" with what's there, doesn't require major reworking of the >>> existing code, etc. >>> >>> An alternative approach is to document very clearly this issue. That >>> might be good in the short term, but IMO the time would be better spent >>> ruminating over a way to fit this into the syntax, and thinking >>> deeply/finding a way to convince yourself and others that this change >>> doesn't break previous .ll files. >>> >>> It's just about thinking and coming up with a new syntax that fits well >>> and that won't break existing .ll files. The key places for making this >>> round-trip are AssemblyWriter::printBasicBlock in lib/IR/AsmWriter.cpp >>> and LLParser::ParseBasicBlock in lib/AsmParser/LLParser.cpp. The parsing >>> side is likely to be entirely in lib/AsmParser/LLLexer.cpp where you need >>> to find a way to get a new token "LocalLabelID" returned for the new syntax. >>> >>> To reiterate, the goal of such a change is solely to avoid people >>> getting confused about the implicit numbering. It needs to be >>> reminiscent/suggestive of the instruction numbering syntax to avoid this >>> confusion. >>> >>> Heck, there may be something within the existing syntax that would work >>> fine for this, but which we can recognize as being "unnamed", rather than a >>> unique name e.g. currently $1: will give the BB a name "$1" (in the sense >>> of getName()), and then "$2:" will give a name "$2", etc., which will cause >>> a lot of pointless string allocations; recognizing a decimal number here >>> might be all that's needed (and updating the outputting code accordingly), >>> although I'm not sure a prefix $ is the best syntax. >>> >>> Maybe we could even get away with %42: as a BB label and that would be >>> maximally reminiscent. The way that numbered local variables are handled is >>> sort of ad-hoc (it is actually also handled in the Lexer; all the parser >>> sees is lltok::LocalVarID). By just changing LLLexer::LexPercent in >>> LLLexer.cpp to recognize a local label and emit a "LocalLabelID" token, >>> then adding an `else if` to the first `if` in LLParser::ParseBasicBlock, >>> you could probably get a working solution too. However, this introduces an >>> inconsistency in that now there's this pseudo-common syntax (%[0-9]+) for >>> unnamed things for both BB's and instructions, but in the case of >>> instructions, the % sigil is always needed, while the label syntax isn't >>> sigilized by default, but permits this weird sigilized temporary numbered >>> form. Maybe that slight inconsistency is worth it? If the inconsistency is >>> really bothersome, we could also have BB's be able to start sigilized with >>> % in the other case like instructions are (there is no ambiguity because of >>> the trailing `:`), but allow the unsigilized versions for compatibility; >>> this may be more consistent from a semantic perspective too, since we refer >>> to them sigilized when used as instruction operands. >>> >>> Or maybe you could have the BB be numbered just like `42:` without the >>> sigil. We already lex a label like 42:, but we just have the issue that I >>> mentioned with $1: that we set this string as the getName() value which >>> creates a bunch of useless strings. If you just change the code to emit a >>> "LocalLabelID" for this case and imitate how we handle locally numbered >>> instructions, that could be a pretty clean fix. However, that would change >>> the behavior for how we handle a label like `0:`, for example, with this >>> behavior, the following IR asm would work: >>> >>> define void @foo() { >>> 0: >>> %1 = alloca i8* >>> ret void >>> } >>> >>> but since with our current behavior we handle `0:` as a BB name and set >>> it's getName() as "0", which causes it to not take up the first unnamed >>> value slot (the %0'th one), so then you get an error that %1 should be %0. >>> This may be an annoying forwards-compatibility issue for a while when we >>> still have to work with not-trunk LLVM's, and this incompatibility may not >>> be worth it. Actually all the suggestions that I've made so far have this >>> same issue :/ Actually I think that it is unsolvable without a >>> forwards-compatibility break due to this (any label that was previously >>> accepted would not increment the unnamed local counter, which would cause >>> all the existing unnamed locals to be off by one and cause an error). We do >>> break forward-compatibility from time to time (e.g. the syntax for the new >>> attributes system), so it might not be that big of an issue (although >>> obviously the community will have to decide about the trade-off for a >>> temporary nuisance vs. the issue this solves). If breaking >>> forwards-compatibility is OK, then I would strongly suggest the `0:` syntax >>> or `%0:`. >>> >>> Hopefully I've given you a bit of the flavor of the issues involved. >>> It's basically just a problem of sitting down and thinking hard, finding >>> something cleanly-implementable that doesn't break backwards compatibility, >>> and checking with the community that the syntax is agreeable and that any >>> forwards-compatibility break is ok. >>> >>> -- Sean Silva >>> >>> >>> On Tue, Nov 26, 2013 at 8:02 PM, Mikael Lyngvig <mikael at lyngvig.org>wrote: >>> >>>> The language reference states that local temporaries begin with index >>>> 0, but if I try that on my not-entirely-up-to-date v3.4 llc (it is like a >>>> week old), I get an error "instruction expected to be numbered '%1'". >>>> >>>> Also, quite a few examples in the LR uses %0 as a local identifier. >>>> >>>> Should I fix those or is it a problem in llc? >>>> >>>> >>>> -- Mikael >>>> >>>> >>>> _______________________________________________ >>>> LLVM Developers mailing list >>>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>>> >>>> >>> >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20131127/01206dc0/attachment.html>
Possibly Parallel Threads
- [LLVMdev] Bug in Language Reference? %0 versus %1 as starting index.
- [LLVMdev] Bug in Language Reference? %0 versus %1 as starting index.
- [LLVMdev] Bug in Language Reference? %0 versus %1 as starting index.
- [LLVMdev] Bug in Language Reference? %0 versus %1 as starting index.
- [LLVMdev] Bug in Language Reference? %0 versus %1 as starting index.