Sean Silva
2015-Feb-02 20:43 UTC
[LLVMdev] Inconsistencies or intended behaviour of LLVM IR?
On Mon, Feb 2, 2015 at 9:51 AM, Robin Eklind <carl.eklind at myport.ac.uk> wrote:> (forgot to cc the list) > > Answers, questions and assumptions are inlined in the response. > > If someone with knowledge of the LLVM IR type system could take a look at > my assumptions below I'd be very happy. > > On 01/30/2015 02:24 AM, Sean Silva wrote: > >> On Thu, Jan 29, 2015 at 10:42 PM, Robin Eklind <carl.eklind at myport.ac.uk> >> wrote: >> >> Thank you for reviewing and commiting the patch Sean :) It was the first >>> one I've ever submitted to LLVM and the whole process was really smooth! >>> Using Phabricator with GitHub OAuth login was brilliant as it removed one >>> more step for new contributors. I also feel very happy that the first >>> patch >>> ended up removing more code than it introduced :) Not likely to speed up >>> the compilation process by a lot, but one can hope to keep the trend! >>> >>> >> Great! >> >> >> >>> I read the blog post about the type system rewrite. Thank you for the >>> link. It did clear up a lot of my uncertainties, but introduced a new >>> one. >>> Could you help me make sense of this part, which was presented under the >>> "Identified structs have a 1-1 mapping with a name" section. >>> >>> "... and the only types that can be named are identified structs" >>>> >>> >>> Does this mean that other types cannot be named? What about type type >>> "%x" >>> in b.ll? It seems like I'm interpreting this in the wrong way. Could you >>> help me make this clear? Is there a difference between a named type and >>> an >>> identified type (or are those two ways of saying the same thing)? If >>> types >>> other than structures can be given names, does this name impact type >>> equality somehow? >>> >>> >> I'll need to punt to someone else for these questions. I haven't dealt >> with >> this part of the IR in a while. >> >> > > Anyone else knowledgeable in this area? I would like to list a set of > assumptions that I've made after reading the blog post and experimenting > with the reference implementation. If anyone could verify these > assumptions, and of cause point out which are incorrect, I'd be very > grateful. > > * Assumption 1 - all types can be given a name, not only structures. > * Assumption 2 - the type name works as an alias for all types except > structures, and it is ignored when calculating type equality. > * Assumption 3 - for structures the type name works as an identity, and > type equality depends on it. > * Assumption 4 - type equality is calculated by comparing the base type > (e.g. the underlying type of a type name identifier) of one type against > another (recursively and for each element in the case of vectors, arrays > and other derived types). In the case of identified structures the > comparison is made strictly based on the structure's name, and in the case > of structure literals the comparison is made in the same way as for other > derived types. >There are quite a few people on the list that can answer this. Just a matter of waiting for one of them to pipe up.> > > >>> To keep up with the spirit of the original topic here are a few more >>> items >>> :) >>> >>> * Item 11 - hexadecimal integer constants >>> >>> The lexer handles hexadecimal integer constants, e.g. from >>> lib/AsmParser/LLLexer.cpp >>> >>> /// HexIntConstant [us]0x[0-9A-Fa-f]+ >>>> >>> >>> This representation of integer constants is not mentioned in the language >>> specification as far as I can tell. >>> >>> >> I assume you are talking about the 'u' and 's' prefix? That seems like a >> historical artifact. The type system doesn't have signedness so there is >> no >> sense in which a constant can be "signed" or "unsigned". In fact, most >> places that even look at the signedness of the lexer's APSIntVal it's just >> to issue an error. A patch removing this old cruft would be great. >> >> > > I'd be happy to remove this old cruft :) Just want to make sure I > understood correctly. Are you referring to the prefix or the whole > HexIntConstant representation? Because if we simply remove the prefix it > would collide with the hexadecimal representation of floating point > constants. >If we don't currently accept 0xDEADBEEF as an integer constant, then it's probably safe to remove HexIntConstant altogether. That u and s prefixed stuff is clearly out of date by several years, so clearly nobody is relying on this if that is the only way to get a hex integer constant.> > It seems like clang has been using HexIntConstants in the past (and maybe > still?), based on the following comment from lib/AsmParser/LLLexer.cpp: > > > // Check for [us]0x[0-9A-Fa-f]+ which are Hexadecimal constant generated > by > > // the CFE to avoid forcing it to deal with 64-bit numbers. > > Is clang still using this representation? If not, I'll start preparing a > patch to get rid of the HexIntConstant parsing :) >I don't think any code inside of clang ever directly writes .ll files; it all happens via the llvm libraries. So all you need to make sure is that nowhere inside the llvm libraries will write out .ll which has this construct.> > >>> * Item 12 - constant expressions >>> >>> The documentation of sext states that the bit size of the constant must >>> be >>> smaller than the target type, but the implementation also accepts >>> constants >>> which have the same size as the target type. E.g. the documentation >>> should >>> be updated or the implementation made more strict. >>> >>> sext (CST to TYPE) >>>> Sign extend a constant to another type. The bit size of CST must be >>>> >>> smaller than the bit size of TYPE. Both types must be integers. >>> >>> The same goes for the trunc, zext, sext, fptrunc and fpext operations. >>> Some refer to larger instead of smaller but none states that types of >>> equal >>> size is allowed. >>> >>> >> Probably worth updating the documentation to what is actually allowed by >> the code. Could you please send a patch to LangRef? (and for convenience, >> can you point to the relevant source code for citation?). >> >> > I'll try to look into it. So far I've not found this in the source code, > but rather by examining the behaviour of compiling .ll files with clang. >Surely there is somewhere in the llvm libraries where we either reject or accept (through inaction) extension/truncation to types of the same size. Maybe the verifier?> > >>> * Item 13 - LocalVar and LocalID for named types >>> >>> This is more of a question. Why are types referred to using local names >>> "%x" instead of global names "@x"? It seems inconsistent as local names >>> are >>> scoped to the function; a local variable name in one function refers to a >>> different value from a local variable name in another. Since types are >>> scoped to the module wouldn't a global name make more sense? >>> >>> >> I doubt there's a particular rationale. I wouldn't pay too much attention >> to the sigils. They are pretty much arbitrary and just to make the lexer >> simpler, similar to using introducer keywords makes the parser simpler. >> >> A more concerning inconsistency regarding sigils (if choice of sigils were >> to be concerning) is the use of the same sigils for types and values. >> Types >> are a purely compile-time thing while locals and globals actually >> correspond to materializable run-time values (slightly muddled by things >> like dbg.declare and llvm.assume). >> >> > Would it make sense to start a discussion about this inconsistency where > the same sigil is used for types and values? It the compatibility between > releases is ensured using the Bitcode format, it may be possible to > introduce a patch to the assembly representation of LLVM IR. To port old > files to the new representation one could convert .ll files to .bc using > the current version of llvm-as, and then convert back using a newer version > of llvm-dis. I can understand if this is a low priority issue, but > discussing and fixing any inconsistency in the language makes sense and > pays off in the long run. >I don't think anybody really cares about the sigils. They are just there to simplify the lexer/parser code. In this case, the complexity of reconstructing the .ll files *including the FileCheck comments* is probably not worth it (especially since any mistakes effectively end up silently reducing our test coverage). -- Sean Silva> > >>> >>> As always, I'm eager to hear more about the type system in particular. >>> The >>> compilation timed in at 120m36.240s while the test cases took 32m10.111s. >>> It will be interesting to see if this goes up or down as time passes :) >>> >>> >> Unfortunately probably up. On my main machine in college, a full build of >> LLVM + Clang took 20 minutes. Last I checked (quite some time ago), that >> machine took 40 minutes. >> >> Also, btw, you can do builddir/bin/llvm-lit llvm/test/path/to/test.ll to >> run just a single test while iterating (or shell glob a list of tests; or >> pass a directory). There's also a way to run a subset of the unittests, >> but >> I forget it off the top of my head. >> >> -- Sean Silva >> >> >> >>> Cheers /Robin Eklind >>> >>> >>> On 01/28/2015 08:31 PM, Sean Silva wrote: >>> >>> On Wed, Jan 28, 2015 at 6:28 PM, Robin Eklind <carl.eklind at myport.ac.uk >>>> > >>>> wrote: >>>> >>>> Hello Sean, >>>> >>>>> >>>>> Thank you for your reply. I'll give your suggestion to item 6 and 7 a >>>>> try >>>>> tonight. I'll start a compilation and let it run throughout the night. >>>>> My >>>>> laptop (x61s) is 8 years old by know, so compiling LLVM takes a little >>>>> time >>>>> :) >>>>> >>>>> >>>>> This is why I did so much documentation work when in college. The docs >>>> build much faster. >>>> >>>> >>>> >>>> Regarding item 8. I don't know if anyone is using "": in the wild so >>>>> fixing the implementation might make sense. If not the documentation >>>>> (e.g. >>>>> the QuoteLabel comment) should be updated to be in line with the >>>>> implementation. >>>>> >>>>> >>>>> FYI the textual IR doesn't have a compatibility guarantee (we try not >>>> to >>>> egregiously change it, but users don't expect .ll to work across >>>> versions). >>>> >>>> >>>> >>>> I only included item 9 since I stumbled upon it once cross-referencing >>>>> the >>>>> source code with the language specification. Bitrot for a project of >>>>> this >>>>> size is to be expected. >>>>> >>>>> I'm still very interested to hear about the items related to types, >>>>> e.g. >>>>> item 1 and 2. Is there a good reference which describes how type >>>>> equality >>>>> works in LLVM IR? If the source code is the reference, could someone >>>>> with >>>>> the high level knowledge get me up to speed? >>>>> >>>>> >>>>> Off the top of my head maybe >>>> http://blog.llvm.org/2011/11/llvm-30-type-system-rewrite.html >>>> >>>> >>>> >>>> Item 1 still confuses me, so I'd be very happy if someone with more >>>>> insight could clarify if this is the intended behaviour and if so the >>>>> motivation behind it. >>>>> >>>>> As it so happens, I forgot to include item 10 :) >>>>> >>>>> * Item 10 - lli vs. clang output >>>>> >>>>> Using the same source files as before, it seems like lli and clang >>>>> treats >>>>> common linkage and constant variables differently. The following >>>>> execution >>>>> demonstrates the return value after executing i.ll, j.ll, k.ll and l.ll >>>>> with lli and clang respectively: >>>>> >>>>> $ clang i.ll && ./a.out ; echo $? >>>>> >>>>>> 37 >>>>>> >>>>>> $ lli i.ll ; echo $? >>>>>> 37 >>>>>> >>>>>> >>>>>> $ clang j.ll && ./a.out ; echo $? >>>>>> 0 >>>>>> >>>>>> $ lli j.ll ; echo $? >>>>>> 42 >>>>>> >>>>>> >>>>>> $ clang k.ll && ./a.out ; echo $? >>>>>> 37 >>>>>> >>>>>> $ lli k.ll ; echo $? >>>>>> 37 >>>>>> >>>>>> >>>>>> $ clang l.ll && ./a.out ; echo $? >>>>>> Segmentation fault >>>>>> 139 >>>>>> >>>>>> $ lli l.ll ; echo $? >>>>>> 37 >>>>>> >>>>>> >>>>> >>>>> Some of these linkage combinations and operations have dubious >>>> semantics. >>>> Talking briefly with Rafael Espindola over a build, sounds like we >>>> should >>>> mostly tighten up the verifier to remove some of these weird cases. For >>>> example, storing to a constant is sort of .... I'm sort of surprised it >>>> works at all. >>>> >>>> -- Sean Silva >>>> >>>> >>>> >>>> Looking forward to hear more about type equality, or get a pointer as >>>>> to >>>>> where I can read up about it. >>>>> >>>>> Cheers /Robin Eklind >>>>> >>>>> >>>>> >>>>> On 01/28/2015 03:45 PM, Sean Silva wrote: >>>>> >>>>> A couple quick comments inline (didn't touch on all points): >>>>> >>>>>> >>>>>> On Wed, Jan 28, 2015 at 1:49 AM, Robin Eklind < >>>>>> carl.eklind at myport.ac.uk >>>>>> >>>>>>> >>>>>>> wrote: >>>>>> >>>>>> Hello everyone! >>>>>> >>>>>> >>>>>>> I've recently had a chance to familiarize myself with the >>>>>>> nitty-gritty >>>>>>> details of LLVM IR. It has been a great learning experience, >>>>>>> sometimes >>>>>>> frustrating or confusing but mostly rewarding. >>>>>>> >>>>>>> There are a few cases I've come across which seems odd to me. I've >>>>>>> tried >>>>>>> to cross reference with the language specification and the source >>>>>>> code >>>>>>> to >>>>>>> the best of my abilities, but would like to reach out to an >>>>>>> experienced >>>>>>> crowd with a few questions. >>>>>>> >>>>>>> Could you help me out by taking a look at these examples? To my >>>>>>> novice >>>>>>> eyes they seem to highlight inconsistencies in LLVM IR (or the >>>>>>> reference >>>>>>> implementation), but it is quite likely that I've overlooked >>>>>>> something. >>>>>>> Please help me out. >>>>>>> >>>>>>> Note: the example source files have been attached and a copy is made >>>>>>> available at https://github.com/mewplay/ll >>>>>>> >>>>>>> * Item 1 - named pointer types >>>>>>> >>>>>>> It is possible to create a named array pointer type (and many >>>>>>> others), >>>>>>> but >>>>>>> not a named structure pointer type. E.g. >>>>>>> >>>>>>> %x = type [1 x i32]* ; valid. >>>>>>> %x = type {i32}* ; invalid. >>>>>>> >>>>>>> Is this the intended behaviour? Attaching a.ll, b.ll, c.ll and d.ll >>>>>>> for >>>>>>> reference. All files except d.ll compiles without error using clang >>>>>>> version >>>>>>> 3.5.1 (tags/RELEASE_351/final). >>>>>>> >>>>>>> $ clang d.ll >>>>>>> >>>>>>> d.ll:3:16: error: expected top-level entity >>>>>>>> %x = type {i32}* >>>>>>>> ^ >>>>>>>> 1 error generated. >>>>>>>> >>>>>>>> >>>>>>>> Does it have anything to do with type equality? (just a hunch) >>>>>>> >>>>>>> * Item 2 - equality of named types >>>>>>> >>>>>>> A named integer type is equivalent to its literal type counterpart, >>>>>>> but >>>>>>> the same is not true for named and literal structures. I am certain >>>>>>> that >>>>>>> I've read about this before, but can't seem to locate the right >>>>>>> section >>>>>>> of >>>>>>> the language specification; could anyone point me in the right >>>>>>> direction? >>>>>>> Also, what is the motivation behind this decision? I've skimmed over >>>>>>> the >>>>>>> code which handles named structure types (in lib/IR/core.cpp), but >>>>>>> would >>>>>>> love to hear the high level idea. >>>>>>> >>>>>>> Attaching e.ll, f.ll, g.ll and h.ll for reference. All compile just >>>>>>> file >>>>>>> except h.ll, which produces the following error message (using the >>>>>>> same >>>>>>> version of clang as above): >>>>>>> >>>>>>> $ clang h.ll >>>>>>> >>>>>>> h.ll:10:23: error: argument is not of expected type '%x = type { i32 >>>>>>>> }' >>>>>>>> call void (%x)* @foo({i32} {i32 0}) >>>>>>>> ^ >>>>>>>> 1 error generated. >>>>>>>> >>>>>>>> >>>>>>>> * Item 3 - zero initialized common linkage variables >>>>>>> >>>>>>> According to the language specification common linkage variables are >>>>>>> required to have a zero initializer [1]. If so, why are they also >>>>>>> required >>>>>>> to provide an initial value? >>>>>>> >>>>>>> Attaching i.ll and j.ll for reference. Both compiles just fine and >>>>>>> once >>>>>>> executed i.ll returns 37 and j.ll return 0. If the common linkage >>>>>>> variable >>>>>>> @x was not initialized to 0, j.ll would have returned 42. >>>>>>> >>>>>>> * Item 4 - constant common linkage variables >>>>>>> >>>>>>> The language specification states that common linkage variables may >>>>>>> not >>>>>>> be >>>>>>> marked as constant [1]. The parser doesn't seem to enforce this >>>>>>> restriction. Would doing so cause any problems? >>>>>>> >>>>>>> Attaching k.ll and l.ll for reference. Both compiles just fine, but >>>>>>> once >>>>>>> executed k.ll returns 37 (e.g. the constant variable was overwritten) >>>>>>> while >>>>>>> l.ll segfaults as expected when it tries to overwrite a read-only >>>>>>> memory >>>>>>> location. >>>>>>> >>>>>>> * Item 5 - appending linkage restrictions >>>>>>> >>>>>>> An extract from the language specification [1]: >>>>>>> >>>>>>> "appending" linkage may only be applied to global variables of >>>>>>> pointer >>>>>>> >>>>>>> >>>>>>>> to array type. >>>>>>>> >>>>>>> >>>>>>> Similarly to item 4 this restriction isn't enforced by the parser. >>>>>>> Would >>>>>>> it make sense doing so, or is there any problem with such an >>>>>>> approach? >>>>>>> >>>>>>> * Item 6 - hash token >>>>>>> >>>>>>> The hash token (#) is defined in lib/AsmParser/LLToken.h (release >>>>>>> version >>>>>>> 3.5.0 of the LLVM source code) but doesn't seem to be used anywhere >>>>>>> else >>>>>>> in >>>>>>> the source tree. Is this token a historical artefact or does it >>>>>>> serve a >>>>>>> purpose? >>>>>>> >>>>>>> >>>>>>> Try deleting it. If the tests pass send a patch. Same for item 7. >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> * Item 7 - backslash token >>>>>> >>>>>>> >>>>>>> Similarly to item 7 the backslash token doesn't seem to serve a >>>>>>> purpose >>>>>>> (with regards to release version 3.5.0 of the LLVM source code). Is >>>>>>> it >>>>>>> used >>>>>>> somewhere? >>>>>>> >>>>>>> * Item 8 - quoted labels >>>>>>> >>>>>>> A comment in lib/AsmParser/LLLexer.cpp (once again, release version >>>>>>> 3.5.0 >>>>>>> of the LLVM source code) describes quoted labels using the following >>>>>>> regexp >>>>>>> (e.g. at least one character between the double quotes): >>>>>>> >>>>>>> /// QuoteLabel "[^"]+": >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> In contrast the reference implementation accepts quoted labels with >>>>>>> zero >>>>>>> or more characters between the double quotes. Which is to be trusted? >>>>>>> The >>>>>>> comment makes more sense as the variable name would effectively be >>>>>>> blank >>>>>>> otherwise. >>>>>>> >>>>>>> >>>>>>> Looks an empty name just results in the thing becoming unnamed. >>>>>>> That's >>>>>>> >>>>>> sort >>>>>> of confusing, but probably not harmful. Maybe we use an empty name as >>>>>> a >>>>>> sentinel for "unnamed", so it sort of just was an accident of the >>>>>> implementation. >>>>>> >>>>>> >>>>>> >>>>>> * Item 9 - undocumented calling conventions >>>>>> >>>>>>> >>>>>>> The following calling conventions are valid tokens but not described >>>>>>> in >>>>>>> the language references as of revision 223189: >>>>>>> >>>>>>> intel_ocl_bicc, x86_stdcallcc, x86_fastcallcc, x86_thiscallcc, >>>>>>> kw_x86_vectorcallcc, arm_apcscc, arm_aapcscc, arm_aapcs_vfpcc, >>>>>>> msp430_intrcc, ptx_kernel, ptx_device, spir_kernel, spir_func, >>>>>>> x86_64_sysvcc, x86_64_win64cc, kw_ghccc >>>>>>> >>>>>>> >>>>>>> This is just bitrot. >>>>>>> >>>>>>> >>>>>> -- Sean Silva >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> Lastly I'd just like to thank the LLVM developers for all the time >>>>>>> and >>>>>>> hard work they've put into this project. I'd especially like to thank >>>>>>> you >>>>>>> for providing a language specification along side of the reference >>>>>>> implementation! Keeping it up to date is a huge task, but also hugely >>>>>>> important. Thank you! >>>>>>> >>>>>>> Kind regards >>>>>>> /Robin Eklind >>>>>>> >>>>>>> [1]: http://llvm.org/docs/LangRef.html#linkage-types >>>>>>> >>>>>>> _______________________________________________ >>>>>>> LLVM Developers mailing list >>>>>>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>> >> _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150202/1b9cec1e/attachment.html>
Reid Kleckner
2015-Feb-02 21:13 UTC
[LLVMdev] Inconsistencies or intended behaviour of LLVM IR?
I would prefer it if we kept hex integer literals. The .ll syntax mostly exists to support compiler developers reasoning about the code. If you hand write a .ll file, the hex syntax can be very handy. Besides, we need to parse floating point hex constants anyway. On Mon, Feb 2, 2015 at 12:43 PM, Sean Silva <chisophugis at gmail.com> wrote:> > > On Mon, Feb 2, 2015 at 9:51 AM, Robin Eklind <carl.eklind at myport.ac.uk> > wrote: > >> (forgot to cc the list) >> >> Answers, questions and assumptions are inlined in the response. >> >> If someone with knowledge of the LLVM IR type system could take a look at >> my assumptions below I'd be very happy. >> >> On 01/30/2015 02:24 AM, Sean Silva wrote: >> >>> On Thu, Jan 29, 2015 at 10:42 PM, Robin Eklind <carl.eklind at myport.ac.uk >>> > >>> wrote: >>> >>> Thank you for reviewing and commiting the patch Sean :) It was the first >>>> one I've ever submitted to LLVM and the whole process was really smooth! >>>> Using Phabricator with GitHub OAuth login was brilliant as it removed >>>> one >>>> more step for new contributors. I also feel very happy that the first >>>> patch >>>> ended up removing more code than it introduced :) Not likely to speed up >>>> the compilation process by a lot, but one can hope to keep the trend! >>>> >>>> >>> Great! >>> >>> >>> >>>> I read the blog post about the type system rewrite. Thank you for the >>>> link. It did clear up a lot of my uncertainties, but introduced a new >>>> one. >>>> Could you help me make sense of this part, which was presented under the >>>> "Identified structs have a 1-1 mapping with a name" section. >>>> >>>> "... and the only types that can be named are identified structs" >>>>> >>>> >>>> Does this mean that other types cannot be named? What about type type >>>> "%x" >>>> in b.ll? It seems like I'm interpreting this in the wrong way. Could you >>>> help me make this clear? Is there a difference between a named type and >>>> an >>>> identified type (or are those two ways of saying the same thing)? If >>>> types >>>> other than structures can be given names, does this name impact type >>>> equality somehow? >>>> >>>> >>> I'll need to punt to someone else for these questions. I haven't dealt >>> with >>> this part of the IR in a while. >>> >>> >> >> Anyone else knowledgeable in this area? I would like to list a set of >> assumptions that I've made after reading the blog post and experimenting >> with the reference implementation. If anyone could verify these >> assumptions, and of cause point out which are incorrect, I'd be very >> grateful. >> >> * Assumption 1 - all types can be given a name, not only structures. >> * Assumption 2 - the type name works as an alias for all types except >> structures, and it is ignored when calculating type equality. >> * Assumption 3 - for structures the type name works as an identity, and >> type equality depends on it. >> * Assumption 4 - type equality is calculated by comparing the base type >> (e.g. the underlying type of a type name identifier) of one type against >> another (recursively and for each element in the case of vectors, arrays >> and other derived types). In the case of identified structures the >> comparison is made strictly based on the structure's name, and in the case >> of structure literals the comparison is made in the same way as for other >> derived types. >> > > There are quite a few people on the list that can answer this. Just a > matter of waiting for one of them to pipe up. > > >> >> >> >>>> To keep up with the spirit of the original topic here are a few more >>>> items >>>> :) >>>> >>>> * Item 11 - hexadecimal integer constants >>>> >>>> The lexer handles hexadecimal integer constants, e.g. from >>>> lib/AsmParser/LLLexer.cpp >>>> >>>> /// HexIntConstant [us]0x[0-9A-Fa-f]+ >>>>> >>>> >>>> This representation of integer constants is not mentioned in the >>>> language >>>> specification as far as I can tell. >>>> >>>> >>> I assume you are talking about the 'u' and 's' prefix? That seems like a >>> historical artifact. The type system doesn't have signedness so there is >>> no >>> sense in which a constant can be "signed" or "unsigned". In fact, most >>> places that even look at the signedness of the lexer's APSIntVal it's >>> just >>> to issue an error. A patch removing this old cruft would be great. >>> >>> >> >> I'd be happy to remove this old cruft :) Just want to make sure I >> understood correctly. Are you referring to the prefix or the whole >> HexIntConstant representation? Because if we simply remove the prefix it >> would collide with the hexadecimal representation of floating point >> constants. >> > > If we don't currently accept 0xDEADBEEF as an integer constant, then it's > probably safe to remove HexIntConstant altogether. That u and s prefixed > stuff is clearly out of date by several years, so clearly nobody is relying > on this if that is the only way to get a hex integer constant. > > >> >> It seems like clang has been using HexIntConstants in the past (and maybe >> still?), based on the following comment from lib/AsmParser/LLLexer.cpp: >> >> > // Check for [us]0x[0-9A-Fa-f]+ which are Hexadecimal constant >> generated by >> > // the CFE to avoid forcing it to deal with 64-bit numbers. >> >> Is clang still using this representation? If not, I'll start preparing a >> patch to get rid of the HexIntConstant parsing :) >> > > I don't think any code inside of clang ever directly writes .ll files; it > all happens via the llvm libraries. So all you need to make sure is that > nowhere inside the llvm libraries will write out .ll which has this > construct. > > >> >> >>>> * Item 12 - constant expressions >>>> >>>> The documentation of sext states that the bit size of the constant must >>>> be >>>> smaller than the target type, but the implementation also accepts >>>> constants >>>> which have the same size as the target type. E.g. the documentation >>>> should >>>> be updated or the implementation made more strict. >>>> >>>> sext (CST to TYPE) >>>>> Sign extend a constant to another type. The bit size of CST must be >>>>> >>>> smaller than the bit size of TYPE. Both types must be integers. >>>> >>>> The same goes for the trunc, zext, sext, fptrunc and fpext operations. >>>> Some refer to larger instead of smaller but none states that types of >>>> equal >>>> size is allowed. >>>> >>>> >>> Probably worth updating the documentation to what is actually allowed by >>> the code. Could you please send a patch to LangRef? (and for convenience, >>> can you point to the relevant source code for citation?). >>> >>> >> I'll try to look into it. So far I've not found this in the source code, >> but rather by examining the behaviour of compiling .ll files with clang. >> > > Surely there is somewhere in the llvm libraries where we either reject or > accept (through inaction) extension/truncation to types of the same size. > Maybe the verifier? > > >> >> >>>> * Item 13 - LocalVar and LocalID for named types >>>> >>>> This is more of a question. Why are types referred to using local names >>>> "%x" instead of global names "@x"? It seems inconsistent as local names >>>> are >>>> scoped to the function; a local variable name in one function refers to >>>> a >>>> different value from a local variable name in another. Since types are >>>> scoped to the module wouldn't a global name make more sense? >>>> >>>> >>> I doubt there's a particular rationale. I wouldn't pay too much attention >>> to the sigils. They are pretty much arbitrary and just to make the lexer >>> simpler, similar to using introducer keywords makes the parser simpler. >>> >>> A more concerning inconsistency regarding sigils (if choice of sigils >>> were >>> to be concerning) is the use of the same sigils for types and values. >>> Types >>> are a purely compile-time thing while locals and globals actually >>> correspond to materializable run-time values (slightly muddled by things >>> like dbg.declare and llvm.assume). >>> >>> >> Would it make sense to start a discussion about this inconsistency where >> the same sigil is used for types and values? It the compatibility between >> releases is ensured using the Bitcode format, it may be possible to >> introduce a patch to the assembly representation of LLVM IR. To port old >> files to the new representation one could convert .ll files to .bc using >> the current version of llvm-as, and then convert back using a newer version >> of llvm-dis. I can understand if this is a low priority issue, but >> discussing and fixing any inconsistency in the language makes sense and >> pays off in the long run. >> > > I don't think anybody really cares about the sigils. They are just there > to simplify the lexer/parser code. In this case, the complexity of > reconstructing the .ll files *including the FileCheck comments* is probably > not worth it (especially since any mistakes effectively end up silently > reducing our test coverage). > > -- Sean Silva > > >> >> >>>> >>>> As always, I'm eager to hear more about the type system in particular. >>>> The >>>> compilation timed in at 120m36.240s while the test cases took >>>> 32m10.111s. >>>> It will be interesting to see if this goes up or down as time passes :) >>>> >>>> >>> Unfortunately probably up. On my main machine in college, a full build of >>> LLVM + Clang took 20 minutes. Last I checked (quite some time ago), that >>> machine took 40 minutes. >>> >>> Also, btw, you can do builddir/bin/llvm-lit llvm/test/path/to/test.ll to >>> run just a single test while iterating (or shell glob a list of tests; or >>> pass a directory). There's also a way to run a subset of the unittests, >>> but >>> I forget it off the top of my head. >>> >>> -- Sean Silva >>> >>> >>> >>>> Cheers /Robin Eklind >>>> >>>> >>>> On 01/28/2015 08:31 PM, Sean Silva wrote: >>>> >>>> On Wed, Jan 28, 2015 at 6:28 PM, Robin Eklind < >>>>> carl.eklind at myport.ac.uk> >>>>> wrote: >>>>> >>>>> Hello Sean, >>>>> >>>>>> >>>>>> Thank you for your reply. I'll give your suggestion to item 6 and 7 a >>>>>> try >>>>>> tonight. I'll start a compilation and let it run throughout the >>>>>> night. My >>>>>> laptop (x61s) is 8 years old by know, so compiling LLVM takes a little >>>>>> time >>>>>> :) >>>>>> >>>>>> >>>>>> This is why I did so much documentation work when in college. The >>>>> docs >>>>> build much faster. >>>>> >>>>> >>>>> >>>>> Regarding item 8. I don't know if anyone is using "": in the wild so >>>>>> fixing the implementation might make sense. If not the documentation >>>>>> (e.g. >>>>>> the QuoteLabel comment) should be updated to be in line with the >>>>>> implementation. >>>>>> >>>>>> >>>>>> FYI the textual IR doesn't have a compatibility guarantee (we try >>>>> not to >>>>> egregiously change it, but users don't expect .ll to work across >>>>> versions). >>>>> >>>>> >>>>> >>>>> I only included item 9 since I stumbled upon it once cross-referencing >>>>>> the >>>>>> source code with the language specification. Bitrot for a project of >>>>>> this >>>>>> size is to be expected. >>>>>> >>>>>> I'm still very interested to hear about the items related to types, >>>>>> e.g. >>>>>> item 1 and 2. Is there a good reference which describes how type >>>>>> equality >>>>>> works in LLVM IR? If the source code is the reference, could someone >>>>>> with >>>>>> the high level knowledge get me up to speed? >>>>>> >>>>>> >>>>>> Off the top of my head maybe >>>>> http://blog.llvm.org/2011/11/llvm-30-type-system-rewrite.html >>>>> >>>>> >>>>> >>>>> Item 1 still confuses me, so I'd be very happy if someone with more >>>>>> insight could clarify if this is the intended behaviour and if so the >>>>>> motivation behind it. >>>>>> >>>>>> As it so happens, I forgot to include item 10 :) >>>>>> >>>>>> * Item 10 - lli vs. clang output >>>>>> >>>>>> Using the same source files as before, it seems like lli and clang >>>>>> treats >>>>>> common linkage and constant variables differently. The following >>>>>> execution >>>>>> demonstrates the return value after executing i.ll, j.ll, k.ll and >>>>>> l.ll >>>>>> with lli and clang respectively: >>>>>> >>>>>> $ clang i.ll && ./a.out ; echo $? >>>>>> >>>>>>> 37 >>>>>>> >>>>>>> $ lli i.ll ; echo $? >>>>>>> 37 >>>>>>> >>>>>>> >>>>>>> $ clang j.ll && ./a.out ; echo $? >>>>>>> 0 >>>>>>> >>>>>>> $ lli j.ll ; echo $? >>>>>>> 42 >>>>>>> >>>>>>> >>>>>>> $ clang k.ll && ./a.out ; echo $? >>>>>>> 37 >>>>>>> >>>>>>> $ lli k.ll ; echo $? >>>>>>> 37 >>>>>>> >>>>>>> >>>>>>> $ clang l.ll && ./a.out ; echo $? >>>>>>> Segmentation fault >>>>>>> 139 >>>>>>> >>>>>>> $ lli l.ll ; echo $? >>>>>>> 37 >>>>>>> >>>>>>> >>>>>> >>>>>> Some of these linkage combinations and operations have dubious >>>>> semantics. >>>>> Talking briefly with Rafael Espindola over a build, sounds like we >>>>> should >>>>> mostly tighten up the verifier to remove some of these weird cases. For >>>>> example, storing to a constant is sort of .... I'm sort of surprised it >>>>> works at all. >>>>> >>>>> -- Sean Silva >>>>> >>>>> >>>>> >>>>> Looking forward to hear more about type equality, or get a pointer as >>>>>> to >>>>>> where I can read up about it. >>>>>> >>>>>> Cheers /Robin Eklind >>>>>> >>>>>> >>>>>> >>>>>> On 01/28/2015 03:45 PM, Sean Silva wrote: >>>>>> >>>>>> A couple quick comments inline (didn't touch on all points): >>>>>> >>>>>>> >>>>>>> On Wed, Jan 28, 2015 at 1:49 AM, Robin Eklind < >>>>>>> carl.eklind at myport.ac.uk >>>>>>> >>>>>>>> >>>>>>>> wrote: >>>>>>> >>>>>>> Hello everyone! >>>>>>> >>>>>>> >>>>>>>> I've recently had a chance to familiarize myself with the >>>>>>>> nitty-gritty >>>>>>>> details of LLVM IR. It has been a great learning experience, >>>>>>>> sometimes >>>>>>>> frustrating or confusing but mostly rewarding. >>>>>>>> >>>>>>>> There are a few cases I've come across which seems odd to me. I've >>>>>>>> tried >>>>>>>> to cross reference with the language specification and the source >>>>>>>> code >>>>>>>> to >>>>>>>> the best of my abilities, but would like to reach out to an >>>>>>>> experienced >>>>>>>> crowd with a few questions. >>>>>>>> >>>>>>>> Could you help me out by taking a look at these examples? To my >>>>>>>> novice >>>>>>>> eyes they seem to highlight inconsistencies in LLVM IR (or the >>>>>>>> reference >>>>>>>> implementation), but it is quite likely that I've overlooked >>>>>>>> something. >>>>>>>> Please help me out. >>>>>>>> >>>>>>>> Note: the example source files have been attached and a copy is made >>>>>>>> available at https://github.com/mewplay/ll >>>>>>>> >>>>>>>> * Item 1 - named pointer types >>>>>>>> >>>>>>>> It is possible to create a named array pointer type (and many >>>>>>>> others), >>>>>>>> but >>>>>>>> not a named structure pointer type. E.g. >>>>>>>> >>>>>>>> %x = type [1 x i32]* ; valid. >>>>>>>> %x = type {i32}* ; invalid. >>>>>>>> >>>>>>>> Is this the intended behaviour? Attaching a.ll, b.ll, c.ll and d.ll >>>>>>>> for >>>>>>>> reference. All files except d.ll compiles without error using clang >>>>>>>> version >>>>>>>> 3.5.1 (tags/RELEASE_351/final). >>>>>>>> >>>>>>>> $ clang d.ll >>>>>>>> >>>>>>>> d.ll:3:16: error: expected top-level entity >>>>>>>>> %x = type {i32}* >>>>>>>>> ^ >>>>>>>>> 1 error generated. >>>>>>>>> >>>>>>>>> >>>>>>>>> Does it have anything to do with type equality? (just a hunch) >>>>>>>> >>>>>>>> * Item 2 - equality of named types >>>>>>>> >>>>>>>> A named integer type is equivalent to its literal type counterpart, >>>>>>>> but >>>>>>>> the same is not true for named and literal structures. I am certain >>>>>>>> that >>>>>>>> I've read about this before, but can't seem to locate the right >>>>>>>> section >>>>>>>> of >>>>>>>> the language specification; could anyone point me in the right >>>>>>>> direction? >>>>>>>> Also, what is the motivation behind this decision? I've skimmed over >>>>>>>> the >>>>>>>> code which handles named structure types (in lib/IR/core.cpp), but >>>>>>>> would >>>>>>>> love to hear the high level idea. >>>>>>>> >>>>>>>> Attaching e.ll, f.ll, g.ll and h.ll for reference. All compile just >>>>>>>> file >>>>>>>> except h.ll, which produces the following error message (using the >>>>>>>> same >>>>>>>> version of clang as above): >>>>>>>> >>>>>>>> $ clang h.ll >>>>>>>> >>>>>>>> h.ll:10:23: error: argument is not of expected type '%x = type { >>>>>>>>> i32 >>>>>>>>> }' >>>>>>>>> call void (%x)* @foo({i32} {i32 0}) >>>>>>>>> ^ >>>>>>>>> 1 error generated. >>>>>>>>> >>>>>>>>> >>>>>>>>> * Item 3 - zero initialized common linkage variables >>>>>>>> >>>>>>>> According to the language specification common linkage variables are >>>>>>>> required to have a zero initializer [1]. If so, why are they also >>>>>>>> required >>>>>>>> to provide an initial value? >>>>>>>> >>>>>>>> Attaching i.ll and j.ll for reference. Both compiles just fine and >>>>>>>> once >>>>>>>> executed i.ll returns 37 and j.ll return 0. If the common linkage >>>>>>>> variable >>>>>>>> @x was not initialized to 0, j.ll would have returned 42. >>>>>>>> >>>>>>>> * Item 4 - constant common linkage variables >>>>>>>> >>>>>>>> The language specification states that common linkage variables may >>>>>>>> not >>>>>>>> be >>>>>>>> marked as constant [1]. The parser doesn't seem to enforce this >>>>>>>> restriction. Would doing so cause any problems? >>>>>>>> >>>>>>>> Attaching k.ll and l.ll for reference. Both compiles just fine, but >>>>>>>> once >>>>>>>> executed k.ll returns 37 (e.g. the constant variable was >>>>>>>> overwritten) >>>>>>>> while >>>>>>>> l.ll segfaults as expected when it tries to overwrite a read-only >>>>>>>> memory >>>>>>>> location. >>>>>>>> >>>>>>>> * Item 5 - appending linkage restrictions >>>>>>>> >>>>>>>> An extract from the language specification [1]: >>>>>>>> >>>>>>>> "appending" linkage may only be applied to global variables of >>>>>>>> pointer >>>>>>>> >>>>>>>> >>>>>>>>> to array type. >>>>>>>>> >>>>>>>> >>>>>>>> Similarly to item 4 this restriction isn't enforced by the parser. >>>>>>>> Would >>>>>>>> it make sense doing so, or is there any problem with such an >>>>>>>> approach? >>>>>>>> >>>>>>>> * Item 6 - hash token >>>>>>>> >>>>>>>> The hash token (#) is defined in lib/AsmParser/LLToken.h (release >>>>>>>> version >>>>>>>> 3.5.0 of the LLVM source code) but doesn't seem to be used anywhere >>>>>>>> else >>>>>>>> in >>>>>>>> the source tree. Is this token a historical artefact or does it >>>>>>>> serve a >>>>>>>> purpose? >>>>>>>> >>>>>>>> >>>>>>>> Try deleting it. If the tests pass send a patch. Same for item 7. >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> * Item 7 - backslash token >>>>>>> >>>>>>>> >>>>>>>> Similarly to item 7 the backslash token doesn't seem to serve a >>>>>>>> purpose >>>>>>>> (with regards to release version 3.5.0 of the LLVM source code). Is >>>>>>>> it >>>>>>>> used >>>>>>>> somewhere? >>>>>>>> >>>>>>>> * Item 8 - quoted labels >>>>>>>> >>>>>>>> A comment in lib/AsmParser/LLLexer.cpp (once again, release version >>>>>>>> 3.5.0 >>>>>>>> of the LLVM source code) describes quoted labels using the following >>>>>>>> regexp >>>>>>>> (e.g. at least one character between the double quotes): >>>>>>>> >>>>>>>> /// QuoteLabel "[^"]+": >>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> In contrast the reference implementation accepts quoted labels >>>>>>>> with >>>>>>>> zero >>>>>>>> or more characters between the double quotes. Which is to be >>>>>>>> trusted? >>>>>>>> The >>>>>>>> comment makes more sense as the variable name would effectively be >>>>>>>> blank >>>>>>>> otherwise. >>>>>>>> >>>>>>>> >>>>>>>> Looks an empty name just results in the thing becoming unnamed. >>>>>>>> That's >>>>>>>> >>>>>>> sort >>>>>>> of confusing, but probably not harmful. Maybe we use an empty name >>>>>>> as a >>>>>>> sentinel for "unnamed", so it sort of just was an accident of the >>>>>>> implementation. >>>>>>> >>>>>>> >>>>>>> >>>>>>> * Item 9 - undocumented calling conventions >>>>>>> >>>>>>>> >>>>>>>> The following calling conventions are valid tokens but not >>>>>>>> described in >>>>>>>> the language references as of revision 223189: >>>>>>>> >>>>>>>> intel_ocl_bicc, x86_stdcallcc, x86_fastcallcc, x86_thiscallcc, >>>>>>>> kw_x86_vectorcallcc, arm_apcscc, arm_aapcscc, arm_aapcs_vfpcc, >>>>>>>> msp430_intrcc, ptx_kernel, ptx_device, spir_kernel, spir_func, >>>>>>>> x86_64_sysvcc, x86_64_win64cc, kw_ghccc >>>>>>>> >>>>>>>> >>>>>>>> This is just bitrot. >>>>>>>> >>>>>>>> >>>>>>> -- Sean Silva >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> Lastly I'd just like to thank the LLVM developers for all the time >>>>>>>> and >>>>>>>> hard work they've put into this project. I'd especially like to >>>>>>>> thank >>>>>>>> you >>>>>>>> for providing a language specification along side of the reference >>>>>>>> implementation! Keeping it up to date is a huge task, but also >>>>>>>> hugely >>>>>>>> important. Thank you! >>>>>>>> >>>>>>>> Kind regards >>>>>>>> /Robin Eklind >>>>>>>> >>>>>>>> [1]: http://llvm.org/docs/LangRef.html#linkage-types >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> LLVM Developers mailing list >>>>>>>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>> >>> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150202/d07c684a/attachment.html>
Sean Silva
2015-Feb-02 21:49 UTC
[LLVMdev] Inconsistencies or intended behaviour of LLVM IR?
On Mon, Feb 2, 2015 at 1:13 PM, Reid Kleckner <rnk at google.com> wrote:> I would prefer it if we kept hex integer literals. The .ll syntax mostly > exists to support compiler developers reasoning about the code. If you hand > write a .ll file, the hex syntax can be very handy. Besides, we need to > parse floating point hex constants anyway. >What I was saying is that I don't think we have a way to make a hex integer literal without the 'u' or 's' prefix. I assume nobody uses the 'u' or 's' prefix, so there's no harm in removing the parsing of such prefixed constants (i.e. all hex integer constants AFAICT). AFAICT, 0xdeadbeef is considered to have floating point type. E.g.: Sean:~/pg/llvm/test/Integer % cat ~/tmp/testhexconstant.ll define i32 @foo() { ret i32 0xdeadbeef } Sean:~/pg/llvm/test/Integer % ~/pg/release/bin/llvm-as <~/tmp/testhexconstant.ll */Users/Sean/pg/release/bin/llvm-as: <stdin>:2:11: **error: **floating point constant invalid for type* ret i32 0xdeadbeef * ^* zsh: exit 1 ~/pg/release/bin/llvm-as < ~/tmp/testhexconstant.ll Grepping the repository finds some interesting related stuff: Sean:~/pg/llvm/test % git grep 'i32.*0x' We seem to have cases where we add explicit comments regarding the hex form of an integer literal, presumably due to lack of an ability to just write one, e.g.: CodeGen/AArch64/bitfield-insert.ll: %oldval_keep = and *i32 %oldval, 2214592511 ; =0x*83ffffff We also seem to have a test verifying that we accept a hacky workaround: Feature/fold-fpcast.ll: ret *i32 bitcast(float 0x*400D9999A0000000 to i32) -- Sean Silva> > On Mon, Feb 2, 2015 at 12:43 PM, Sean Silva <chisophugis at gmail.com> wrote: > >> >> >> On Mon, Feb 2, 2015 at 9:51 AM, Robin Eklind <carl.eklind at myport.ac.uk> >> wrote: >> >>> (forgot to cc the list) >>> >>> Answers, questions and assumptions are inlined in the response. >>> >>> If someone with knowledge of the LLVM IR type system could take a look >>> at my assumptions below I'd be very happy. >>> >>> On 01/30/2015 02:24 AM, Sean Silva wrote: >>> >>>> On Thu, Jan 29, 2015 at 10:42 PM, Robin Eklind < >>>> carl.eklind at myport.ac.uk> >>>> wrote: >>>> >>>> Thank you for reviewing and commiting the patch Sean :) It was the >>>>> first >>>>> one I've ever submitted to LLVM and the whole process was really >>>>> smooth! >>>>> Using Phabricator with GitHub OAuth login was brilliant as it removed >>>>> one >>>>> more step for new contributors. I also feel very happy that the first >>>>> patch >>>>> ended up removing more code than it introduced :) Not likely to speed >>>>> up >>>>> the compilation process by a lot, but one can hope to keep the trend! >>>>> >>>>> >>>> Great! >>>> >>>> >>>> >>>>> I read the blog post about the type system rewrite. Thank you for the >>>>> link. It did clear up a lot of my uncertainties, but introduced a new >>>>> one. >>>>> Could you help me make sense of this part, which was presented under >>>>> the >>>>> "Identified structs have a 1-1 mapping with a name" section. >>>>> >>>>> "... and the only types that can be named are identified structs" >>>>>> >>>>> >>>>> Does this mean that other types cannot be named? What about type type >>>>> "%x" >>>>> in b.ll? It seems like I'm interpreting this in the wrong way. Could >>>>> you >>>>> help me make this clear? Is there a difference between a named type >>>>> and an >>>>> identified type (or are those two ways of saying the same thing)? If >>>>> types >>>>> other than structures can be given names, does this name impact type >>>>> equality somehow? >>>>> >>>>> >>>> I'll need to punt to someone else for these questions. I haven't dealt >>>> with >>>> this part of the IR in a while. >>>> >>>> >>> >>> Anyone else knowledgeable in this area? I would like to list a set of >>> assumptions that I've made after reading the blog post and experimenting >>> with the reference implementation. If anyone could verify these >>> assumptions, and of cause point out which are incorrect, I'd be very >>> grateful. >>> >>> * Assumption 1 - all types can be given a name, not only structures. >>> * Assumption 2 - the type name works as an alias for all types except >>> structures, and it is ignored when calculating type equality. >>> * Assumption 3 - for structures the type name works as an identity, and >>> type equality depends on it. >>> * Assumption 4 - type equality is calculated by comparing the base type >>> (e.g. the underlying type of a type name identifier) of one type against >>> another (recursively and for each element in the case of vectors, arrays >>> and other derived types). In the case of identified structures the >>> comparison is made strictly based on the structure's name, and in the case >>> of structure literals the comparison is made in the same way as for other >>> derived types. >>> >> >> There are quite a few people on the list that can answer this. Just a >> matter of waiting for one of them to pipe up. >> >> >>> >>> >>> >>>>> To keep up with the spirit of the original topic here are a few more >>>>> items >>>>> :) >>>>> >>>>> * Item 11 - hexadecimal integer constants >>>>> >>>>> The lexer handles hexadecimal integer constants, e.g. from >>>>> lib/AsmParser/LLLexer.cpp >>>>> >>>>> /// HexIntConstant [us]0x[0-9A-Fa-f]+ >>>>>> >>>>> >>>>> This representation of integer constants is not mentioned in the >>>>> language >>>>> specification as far as I can tell. >>>>> >>>>> >>>> I assume you are talking about the 'u' and 's' prefix? That seems like a >>>> historical artifact. The type system doesn't have signedness so there >>>> is no >>>> sense in which a constant can be "signed" or "unsigned". In fact, most >>>> places that even look at the signedness of the lexer's APSIntVal it's >>>> just >>>> to issue an error. A patch removing this old cruft would be great. >>>> >>>> >>> >>> I'd be happy to remove this old cruft :) Just want to make sure I >>> understood correctly. Are you referring to the prefix or the whole >>> HexIntConstant representation? Because if we simply remove the prefix it >>> would collide with the hexadecimal representation of floating point >>> constants. >>> >> >> If we don't currently accept 0xDEADBEEF as an integer constant, then it's >> probably safe to remove HexIntConstant altogether. That u and s prefixed >> stuff is clearly out of date by several years, so clearly nobody is relying >> on this if that is the only way to get a hex integer constant. >> >> >>> >>> It seems like clang has been using HexIntConstants in the past (and >>> maybe still?), based on the following comment from >>> lib/AsmParser/LLLexer.cpp: >>> >>> > // Check for [us]0x[0-9A-Fa-f]+ which are Hexadecimal constant >>> generated by >>> > // the CFE to avoid forcing it to deal with 64-bit numbers. >>> >>> Is clang still using this representation? If not, I'll start preparing a >>> patch to get rid of the HexIntConstant parsing :) >>> >> >> I don't think any code inside of clang ever directly writes .ll files; it >> all happens via the llvm libraries. So all you need to make sure is that >> nowhere inside the llvm libraries will write out .ll which has this >> construct. >> >> >>> >>> >>>>> * Item 12 - constant expressions >>>>> >>>>> The documentation of sext states that the bit size of the constant >>>>> must be >>>>> smaller than the target type, but the implementation also accepts >>>>> constants >>>>> which have the same size as the target type. E.g. the documentation >>>>> should >>>>> be updated or the implementation made more strict. >>>>> >>>>> sext (CST to TYPE) >>>>>> Sign extend a constant to another type. The bit size of CST must >>>>>> be >>>>>> >>>>> smaller than the bit size of TYPE. Both types must be integers. >>>>> >>>>> The same goes for the trunc, zext, sext, fptrunc and fpext operations. >>>>> Some refer to larger instead of smaller but none states that types of >>>>> equal >>>>> size is allowed. >>>>> >>>>> >>>> Probably worth updating the documentation to what is actually allowed by >>>> the code. Could you please send a patch to LangRef? (and for >>>> convenience, >>>> can you point to the relevant source code for citation?). >>>> >>>> >>> I'll try to look into it. So far I've not found this in the source code, >>> but rather by examining the behaviour of compiling .ll files with clang. >>> >> >> Surely there is somewhere in the llvm libraries where we either reject or >> accept (through inaction) extension/truncation to types of the same size. >> Maybe the verifier? >> >> >>> >>> >>>>> * Item 13 - LocalVar and LocalID for named types >>>>> >>>>> This is more of a question. Why are types referred to using local names >>>>> "%x" instead of global names "@x"? It seems inconsistent as local >>>>> names are >>>>> scoped to the function; a local variable name in one function refers >>>>> to a >>>>> different value from a local variable name in another. Since types are >>>>> scoped to the module wouldn't a global name make more sense? >>>>> >>>>> >>>> I doubt there's a particular rationale. I wouldn't pay too much >>>> attention >>>> to the sigils. They are pretty much arbitrary and just to make the lexer >>>> simpler, similar to using introducer keywords makes the parser simpler. >>>> >>>> A more concerning inconsistency regarding sigils (if choice of sigils >>>> were >>>> to be concerning) is the use of the same sigils for types and values. >>>> Types >>>> are a purely compile-time thing while locals and globals actually >>>> correspond to materializable run-time values (slightly muddled by things >>>> like dbg.declare and llvm.assume). >>>> >>>> >>> Would it make sense to start a discussion about this inconsistency where >>> the same sigil is used for types and values? It the compatibility between >>> releases is ensured using the Bitcode format, it may be possible to >>> introduce a patch to the assembly representation of LLVM IR. To port old >>> files to the new representation one could convert .ll files to .bc using >>> the current version of llvm-as, and then convert back using a newer version >>> of llvm-dis. I can understand if this is a low priority issue, but >>> discussing and fixing any inconsistency in the language makes sense and >>> pays off in the long run. >>> >> >> I don't think anybody really cares about the sigils. They are just there >> to simplify the lexer/parser code. In this case, the complexity of >> reconstructing the .ll files *including the FileCheck comments* is probably >> not worth it (especially since any mistakes effectively end up silently >> reducing our test coverage). >> >> -- Sean Silva >> >> >>> >>> >>>>> >>>>> As always, I'm eager to hear more about the type system in particular. >>>>> The >>>>> compilation timed in at 120m36.240s while the test cases took >>>>> 32m10.111s. >>>>> It will be interesting to see if this goes up or down as time passes :) >>>>> >>>>> >>>> Unfortunately probably up. On my main machine in college, a full build >>>> of >>>> LLVM + Clang took 20 minutes. Last I checked (quite some time ago), that >>>> machine took 40 minutes. >>>> >>>> Also, btw, you can do builddir/bin/llvm-lit llvm/test/path/to/test.ll to >>>> run just a single test while iterating (or shell glob a list of tests; >>>> or >>>> pass a directory). There's also a way to run a subset of the unittests, >>>> but >>>> I forget it off the top of my head. >>>> >>>> -- Sean Silva >>>> >>>> >>>> >>>>> Cheers /Robin Eklind >>>>> >>>>> >>>>> On 01/28/2015 08:31 PM, Sean Silva wrote: >>>>> >>>>> On Wed, Jan 28, 2015 at 6:28 PM, Robin Eklind < >>>>>> carl.eklind at myport.ac.uk> >>>>>> wrote: >>>>>> >>>>>> Hello Sean, >>>>>> >>>>>>> >>>>>>> Thank you for your reply. I'll give your suggestion to item 6 and 7 >>>>>>> a try >>>>>>> tonight. I'll start a compilation and let it run throughout the >>>>>>> night. My >>>>>>> laptop (x61s) is 8 years old by know, so compiling LLVM takes a >>>>>>> little >>>>>>> time >>>>>>> :) >>>>>>> >>>>>>> >>>>>>> This is why I did so much documentation work when in college. The >>>>>> docs >>>>>> build much faster. >>>>>> >>>>>> >>>>>> >>>>>> Regarding item 8. I don't know if anyone is using "": in the wild so >>>>>>> fixing the implementation might make sense. If not the documentation >>>>>>> (e.g. >>>>>>> the QuoteLabel comment) should be updated to be in line with the >>>>>>> implementation. >>>>>>> >>>>>>> >>>>>>> FYI the textual IR doesn't have a compatibility guarantee (we try >>>>>> not to >>>>>> egregiously change it, but users don't expect .ll to work across >>>>>> versions). >>>>>> >>>>>> >>>>>> >>>>>> I only included item 9 since I stumbled upon it once >>>>>>> cross-referencing >>>>>>> the >>>>>>> source code with the language specification. Bitrot for a project of >>>>>>> this >>>>>>> size is to be expected. >>>>>>> >>>>>>> I'm still very interested to hear about the items related to types, >>>>>>> e.g. >>>>>>> item 1 and 2. Is there a good reference which describes how type >>>>>>> equality >>>>>>> works in LLVM IR? If the source code is the reference, could someone >>>>>>> with >>>>>>> the high level knowledge get me up to speed? >>>>>>> >>>>>>> >>>>>>> Off the top of my head maybe >>>>>> http://blog.llvm.org/2011/11/llvm-30-type-system-rewrite.html >>>>>> >>>>>> >>>>>> >>>>>> Item 1 still confuses me, so I'd be very happy if someone with more >>>>>>> insight could clarify if this is the intended behaviour and if so the >>>>>>> motivation behind it. >>>>>>> >>>>>>> As it so happens, I forgot to include item 10 :) >>>>>>> >>>>>>> * Item 10 - lli vs. clang output >>>>>>> >>>>>>> Using the same source files as before, it seems like lli and clang >>>>>>> treats >>>>>>> common linkage and constant variables differently. The following >>>>>>> execution >>>>>>> demonstrates the return value after executing i.ll, j.ll, k.ll and >>>>>>> l.ll >>>>>>> with lli and clang respectively: >>>>>>> >>>>>>> $ clang i.ll && ./a.out ; echo $? >>>>>>> >>>>>>>> 37 >>>>>>>> >>>>>>>> $ lli i.ll ; echo $? >>>>>>>> 37 >>>>>>>> >>>>>>>> >>>>>>>> $ clang j.ll && ./a.out ; echo $? >>>>>>>> 0 >>>>>>>> >>>>>>>> $ lli j.ll ; echo $? >>>>>>>> 42 >>>>>>>> >>>>>>>> >>>>>>>> $ clang k.ll && ./a.out ; echo $? >>>>>>>> 37 >>>>>>>> >>>>>>>> $ lli k.ll ; echo $? >>>>>>>> 37 >>>>>>>> >>>>>>>> >>>>>>>> $ clang l.ll && ./a.out ; echo $? >>>>>>>> Segmentation fault >>>>>>>> 139 >>>>>>>> >>>>>>>> $ lli l.ll ; echo $? >>>>>>>> 37 >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> Some of these linkage combinations and operations have dubious >>>>>> semantics. >>>>>> Talking briefly with Rafael Espindola over a build, sounds like we >>>>>> should >>>>>> mostly tighten up the verifier to remove some of these weird cases. >>>>>> For >>>>>> example, storing to a constant is sort of .... I'm sort of surprised >>>>>> it >>>>>> works at all. >>>>>> >>>>>> -- Sean Silva >>>>>> >>>>>> >>>>>> >>>>>> Looking forward to hear more about type equality, or get a pointer >>>>>>> as to >>>>>>> where I can read up about it. >>>>>>> >>>>>>> Cheers /Robin Eklind >>>>>>> >>>>>>> >>>>>>> >>>>>>> On 01/28/2015 03:45 PM, Sean Silva wrote: >>>>>>> >>>>>>> A couple quick comments inline (didn't touch on all points): >>>>>>> >>>>>>>> >>>>>>>> On Wed, Jan 28, 2015 at 1:49 AM, Robin Eklind < >>>>>>>> carl.eklind at myport.ac.uk >>>>>>>> >>>>>>>>> >>>>>>>>> wrote: >>>>>>>> >>>>>>>> Hello everyone! >>>>>>>> >>>>>>>> >>>>>>>>> I've recently had a chance to familiarize myself with the >>>>>>>>> nitty-gritty >>>>>>>>> details of LLVM IR. It has been a great learning experience, >>>>>>>>> sometimes >>>>>>>>> frustrating or confusing but mostly rewarding. >>>>>>>>> >>>>>>>>> There are a few cases I've come across which seems odd to me. I've >>>>>>>>> tried >>>>>>>>> to cross reference with the language specification and the source >>>>>>>>> code >>>>>>>>> to >>>>>>>>> the best of my abilities, but would like to reach out to an >>>>>>>>> experienced >>>>>>>>> crowd with a few questions. >>>>>>>>> >>>>>>>>> Could you help me out by taking a look at these examples? To my >>>>>>>>> novice >>>>>>>>> eyes they seem to highlight inconsistencies in LLVM IR (or the >>>>>>>>> reference >>>>>>>>> implementation), but it is quite likely that I've overlooked >>>>>>>>> something. >>>>>>>>> Please help me out. >>>>>>>>> >>>>>>>>> Note: the example source files have been attached and a copy is >>>>>>>>> made >>>>>>>>> available at https://github.com/mewplay/ll >>>>>>>>> >>>>>>>>> * Item 1 - named pointer types >>>>>>>>> >>>>>>>>> It is possible to create a named array pointer type (and many >>>>>>>>> others), >>>>>>>>> but >>>>>>>>> not a named structure pointer type. E.g. >>>>>>>>> >>>>>>>>> %x = type [1 x i32]* ; valid. >>>>>>>>> %x = type {i32}* ; invalid. >>>>>>>>> >>>>>>>>> Is this the intended behaviour? Attaching a.ll, b.ll, c.ll and >>>>>>>>> d.ll for >>>>>>>>> reference. All files except d.ll compiles without error using clang >>>>>>>>> version >>>>>>>>> 3.5.1 (tags/RELEASE_351/final). >>>>>>>>> >>>>>>>>> $ clang d.ll >>>>>>>>> >>>>>>>>> d.ll:3:16: error: expected top-level entity >>>>>>>>>> %x = type {i32}* >>>>>>>>>> ^ >>>>>>>>>> 1 error generated. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Does it have anything to do with type equality? (just a hunch) >>>>>>>>> >>>>>>>>> * Item 2 - equality of named types >>>>>>>>> >>>>>>>>> A named integer type is equivalent to its literal type >>>>>>>>> counterpart, but >>>>>>>>> the same is not true for named and literal structures. I am certain >>>>>>>>> that >>>>>>>>> I've read about this before, but can't seem to locate the right >>>>>>>>> section >>>>>>>>> of >>>>>>>>> the language specification; could anyone point me in the right >>>>>>>>> direction? >>>>>>>>> Also, what is the motivation behind this decision? I've skimmed >>>>>>>>> over >>>>>>>>> the >>>>>>>>> code which handles named structure types (in lib/IR/core.cpp), but >>>>>>>>> would >>>>>>>>> love to hear the high level idea. >>>>>>>>> >>>>>>>>> Attaching e.ll, f.ll, g.ll and h.ll for reference. All compile just >>>>>>>>> file >>>>>>>>> except h.ll, which produces the following error message (using the >>>>>>>>> same >>>>>>>>> version of clang as above): >>>>>>>>> >>>>>>>>> $ clang h.ll >>>>>>>>> >>>>>>>>> h.ll:10:23: error: argument is not of expected type '%x = type { >>>>>>>>>> i32 >>>>>>>>>> }' >>>>>>>>>> call void (%x)* @foo({i32} {i32 0}) >>>>>>>>>> ^ >>>>>>>>>> 1 error generated. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> * Item 3 - zero initialized common linkage variables >>>>>>>>> >>>>>>>>> According to the language specification common linkage variables >>>>>>>>> are >>>>>>>>> required to have a zero initializer [1]. If so, why are they also >>>>>>>>> required >>>>>>>>> to provide an initial value? >>>>>>>>> >>>>>>>>> Attaching i.ll and j.ll for reference. Both compiles just fine and >>>>>>>>> once >>>>>>>>> executed i.ll returns 37 and j.ll return 0. If the common linkage >>>>>>>>> variable >>>>>>>>> @x was not initialized to 0, j.ll would have returned 42. >>>>>>>>> >>>>>>>>> * Item 4 - constant common linkage variables >>>>>>>>> >>>>>>>>> The language specification states that common linkage variables >>>>>>>>> may not >>>>>>>>> be >>>>>>>>> marked as constant [1]. The parser doesn't seem to enforce this >>>>>>>>> restriction. Would doing so cause any problems? >>>>>>>>> >>>>>>>>> Attaching k.ll and l.ll for reference. Both compiles just fine, but >>>>>>>>> once >>>>>>>>> executed k.ll returns 37 (e.g. the constant variable was >>>>>>>>> overwritten) >>>>>>>>> while >>>>>>>>> l.ll segfaults as expected when it tries to overwrite a read-only >>>>>>>>> memory >>>>>>>>> location. >>>>>>>>> >>>>>>>>> * Item 5 - appending linkage restrictions >>>>>>>>> >>>>>>>>> An extract from the language specification [1]: >>>>>>>>> >>>>>>>>> "appending" linkage may only be applied to global variables of >>>>>>>>> pointer >>>>>>>>> >>>>>>>>> >>>>>>>>>> to array type. >>>>>>>>>> >>>>>>>>> >>>>>>>>> Similarly to item 4 this restriction isn't enforced by the parser. >>>>>>>>> Would >>>>>>>>> it make sense doing so, or is there any problem with such an >>>>>>>>> approach? >>>>>>>>> >>>>>>>>> * Item 6 - hash token >>>>>>>>> >>>>>>>>> The hash token (#) is defined in lib/AsmParser/LLToken.h (release >>>>>>>>> version >>>>>>>>> 3.5.0 of the LLVM source code) but doesn't seem to be used anywhere >>>>>>>>> else >>>>>>>>> in >>>>>>>>> the source tree. Is this token a historical artefact or does it >>>>>>>>> serve a >>>>>>>>> purpose? >>>>>>>>> >>>>>>>>> >>>>>>>>> Try deleting it. If the tests pass send a patch. Same for item 7. >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> * Item 7 - backslash token >>>>>>>> >>>>>>>>> >>>>>>>>> Similarly to item 7 the backslash token doesn't seem to serve a >>>>>>>>> purpose >>>>>>>>> (with regards to release version 3.5.0 of the LLVM source code). >>>>>>>>> Is it >>>>>>>>> used >>>>>>>>> somewhere? >>>>>>>>> >>>>>>>>> * Item 8 - quoted labels >>>>>>>>> >>>>>>>>> A comment in lib/AsmParser/LLLexer.cpp (once again, release version >>>>>>>>> 3.5.0 >>>>>>>>> of the LLVM source code) describes quoted labels using the >>>>>>>>> following >>>>>>>>> regexp >>>>>>>>> (e.g. at least one character between the double quotes): >>>>>>>>> >>>>>>>>> /// QuoteLabel "[^"]+": >>>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> In contrast the reference implementation accepts quoted labels >>>>>>>>> with >>>>>>>>> zero >>>>>>>>> or more characters between the double quotes. Which is to be >>>>>>>>> trusted? >>>>>>>>> The >>>>>>>>> comment makes more sense as the variable name would effectively be >>>>>>>>> blank >>>>>>>>> otherwise. >>>>>>>>> >>>>>>>>> >>>>>>>>> Looks an empty name just results in the thing becoming unnamed. >>>>>>>>> That's >>>>>>>>> >>>>>>>> sort >>>>>>>> of confusing, but probably not harmful. Maybe we use an empty name >>>>>>>> as a >>>>>>>> sentinel for "unnamed", so it sort of just was an accident of the >>>>>>>> implementation. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> * Item 9 - undocumented calling conventions >>>>>>>> >>>>>>>>> >>>>>>>>> The following calling conventions are valid tokens but not >>>>>>>>> described in >>>>>>>>> the language references as of revision 223189: >>>>>>>>> >>>>>>>>> intel_ocl_bicc, x86_stdcallcc, x86_fastcallcc, x86_thiscallcc, >>>>>>>>> kw_x86_vectorcallcc, arm_apcscc, arm_aapcscc, arm_aapcs_vfpcc, >>>>>>>>> msp430_intrcc, ptx_kernel, ptx_device, spir_kernel, spir_func, >>>>>>>>> x86_64_sysvcc, x86_64_win64cc, kw_ghccc >>>>>>>>> >>>>>>>>> >>>>>>>>> This is just bitrot. >>>>>>>>> >>>>>>>>> >>>>>>>> -- Sean Silva >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Lastly I'd just like to thank the LLVM developers for all the time >>>>>>>>> and >>>>>>>>> hard work they've put into this project. I'd especially like to >>>>>>>>> thank >>>>>>>>> you >>>>>>>>> for providing a language specification along side of the reference >>>>>>>>> implementation! Keeping it up to date is a huge task, but also >>>>>>>>> hugely >>>>>>>>> important. Thank you! >>>>>>>>> >>>>>>>>> Kind regards >>>>>>>>> /Robin Eklind >>>>>>>>> >>>>>>>>> [1]: http://llvm.org/docs/LangRef.html#linkage-types >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> LLVM Developers mailing list >>>>>>>>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>>>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>> >>>> _______________________________________________ >>> LLVM Developers mailing list >>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>> >> >> >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150202/6d7b53ec/attachment.html>