thr3ads.net - llvm dev - [LLVMdev] Bug in Language Reference? %0 versus %1 as starting index. [Nov 2013]

If this information is useful, please help other people find it:
Share via:

Mikael Lyngvig

2013-Nov-27 03:35 UTC

[LLVMdev] Bug in Language Reference? %0 versus %1 as starting index.

Without ANY intent of offending anybody, I simply don't like C++.  I did
code in it for some 12 years back from 1990 to 2002, but then I left it
behind with a feeling of happiness.  The main reason I am _trying_ to make
a new language is that I hope to one day come up with something that can
help retiring C++.  I love C# but that language is yet too slow for many
demanding problem domains.

That being said, I don't seriously believe I'll ever finish up my own
language, but as long as I am having a good time along the way, I don't
mind.  Now I spend the majority of my spare time on LLVM documentation
(most of it still pending submission because of various factors).  Once the
dust settles from all the documentation projects I've started on (Arch
Linux build doc, Debian build doc, Windows build doc, Mapping High-Level
Constructs to LLVM IR), I plan to resume work on my own language, which
will be something like Python-syntax C# without .NET and perhaps with
optional garbage collection.

Perhaps I'll some day gather up the courage to pick an easy bug report and
fix that, but it is not very likely that I ever become a core coder on LLVM.


-- Mikael


2013/11/27 Sean Silva <chisophugis at gmail.com>
>
>
>
> On Tue, Nov 26, 2013 at 9:58 PM, Mikael Lyngvig <mikael at
lyngvig.org>wrote:
>
>> Thanks for the lecture :)  But I was not planning on changing a single
>> line in LLVM/Clang.  I stick to the documentation until I've
learned to
>> swim, perhaps even forever.  Ah, now I see.  You thought I meant
"should I
>> modify the code to do this or that."  I only meant to change the
>> documentation.  Please refer to the patch I've sent on
LLVM-commits.
>>  That's about what I had in mind.  I am fully aware that you cannot
simply
>> dive in and hack away on the handling of the %0 temporary.  I
wouldn't ever
>> dream of doing that!
>>
>
> You should dream of doing that. Nobody else has stepped up to do it. Hack
> on the code; ultimately that's where the action is and where you will
gain
> understanding.
> (And I'm probably the worst person to give this advice since I do so
> little code hacking during the school year. I swear, I really do prefer
> coding; when I'm at work with a nice fast machine it's a lot nicer
to hack,
> but at school with a crappy machine, the situation usually only permits
> reviewing patches on the mailing lists or docs changes.)
>
> AFAIK nobody is an "expert" in that code (its probably long out
of core
> for even the people that wrote it); if you dive into it, you can become a
> local expert in it.
>
> -- Sean Silva
>
>
>>
>>
>> -- Mikael
>>
>>
>>
>>
>> 2013/11/27 Sean Silva <chisophugis at gmail.com>
>>
>>> (gah, this turned into a huge digression, sorry)
>>>
>>> The implicit numbering of BB's seems to be a pretty frequent
issue for
>>> people. Surprisingly, the issue boils down to simply changing the
IR asm
>>> (.ll file) syntax so that it can have "unnamed BB's"
in a recognizable way
>>> that fits in with how unnamed values work (the asmprinter makes an
effort
>>> to print a comment with the BB number, but the connection is hard
to see
>>> and it's confusing).
>>>
>>> The thing that makes this not-as-easy-as-it-looks is doing it in a
way
>>> that preserves compatibility with previous IR (and being able to
convince
>>> yourself that this is the case), and the fact that the IR-parsing
code is a
>>> bit twisty (it's not bad, but the way that some things work is
subtly
>>> different from what you would expect) and you have to find
something that
>>> "fits well" with what's there, doesn't require
major reworking of the
>>> existing code, etc.
>>>
>>> An alternative approach is to document very clearly this issue.
That
>>> might be good in the short term, but IMO the time would be better
spent
>>> ruminating over a way to fit this into the syntax, and thinking
>>> deeply/finding a way to convince yourself and others that this
change
>>> doesn't break previous .ll files.
>>>
>>> It's just about thinking and coming up with a new syntax that
fits well
>>> and that won't break existing .ll files. The key places for
making this
>>> round-trip are AssemblyWriter::printBasicBlock in
lib/IR/AsmWriter.cpp
>>> and LLParser::ParseBasicBlock in lib/AsmParser/LLParser.cpp. The
parsing
>>> side is likely to be entirely in lib/AsmParser/LLLexer.cpp where
you need
>>> to find a way to get a new token "LocalLabelID" returned
for the new syntax.
>>>
>>> To reiterate, the goal of such a change is solely to avoid people
>>> getting confused about the implicit numbering. It needs to be
>>> reminiscent/suggestive of the instruction numbering syntax to avoid
this
>>> confusion.
>>>
>>> Heck, there may be something within the existing syntax that would
work
>>> fine for this, but which we can recognize as being
"unnamed", rather than a
>>> unique name e.g. currently $1: will give the BB a name
"$1" (in the sense
>>> of getName()), and then "$2:" will give a name
"$2", etc., which will cause
>>> a lot of pointless string allocations; recognizing a decimal number
here
>>> might be all that's needed (and updating the outputting code
accordingly),
>>> although I'm not sure a prefix $ is the best syntax.
>>>
>>> Maybe we could even get away with %42: as a BB label and that would
be
>>> maximally reminiscent. The way that numbered local variables are
handled is
>>> sort of ad-hoc (it is actually also handled in the Lexer; all the
parser
>>> sees is lltok::LocalVarID). By just changing LLLexer::LexPercent in
>>> LLLexer.cpp to recognize a local label and emit a
"LocalLabelID" token,
>>> then adding an `else if` to the first `if` in
LLParser::ParseBasicBlock,
>>> you could probably get a working solution too. However, this
introduces an
>>> inconsistency in that now there's this pseudo-common syntax
(%[0-9]+) for
>>> unnamed things for both BB's and instructions, but in the case
of
>>> instructions, the % sigil is always needed, while the label syntax
isn't
>>> sigilized by default, but permits this weird sigilized temporary
numbered
>>> form. Maybe that slight inconsistency is worth it? If the
inconsistency is
>>> really bothersome, we could also have BB's be able to start
sigilized with
>>> % in the other case like instructions are (there is no ambiguity
because of
>>> the trailing `:`), but allow the unsigilized versions for
compatibility;
>>> this may be more consistent from a semantic perspective too, since
we refer
>>> to them sigilized when used as instruction operands.
>>>
>>> Or maybe you could have the BB be numbered just like `42:` without
the
>>> sigil. We already lex a label like 42:, but we just have the issue
that I
>>> mentioned with $1: that we set this string as the getName() value
which
>>> creates a bunch of useless strings. If you just change the code to
emit a
>>> "LocalLabelID" for this case and imitate how we handle
locally numbered
>>> instructions, that could be a pretty clean fix. However, that would
change
>>> the behavior for how we handle a label like `0:`, for example, with
this
>>> behavior, the following IR asm would work:
>>>
>>> define void @foo() {
>>> 0:
>>>   %1 = alloca i8*
>>>   ret void
>>> }
>>>
>>> but since with our current behavior we handle `0:` as a BB name and
set
>>> it's getName() as "0", which causes it to not take up
the first unnamed
>>> value slot (the %0'th one), so then you get an error that %1
should be %0.
>>> This may be an annoying forwards-compatibility issue for a while
when we
>>> still have to work with not-trunk LLVM's, and this
incompatibility may not
>>> be worth it. Actually all the suggestions that I've made so far
have this
>>> same issue :/ Actually I think that it is unsolvable without a
>>> forwards-compatibility break due to this (any label that was
previously
>>> accepted would not increment the unnamed local counter, which would
cause
>>> all the existing unnamed locals to be off by one and cause an
error). We do
>>> break forward-compatibility from time to time (e.g. the syntax for
the new
>>> attributes system), so it might not be that big of an issue
(although
>>> obviously the community will have to decide about the trade-off for
a
>>> temporary nuisance vs. the issue this solves). If breaking
>>> forwards-compatibility is OK, then I would strongly suggest the
`0:` syntax
>>> or `%0:`.
>>>
>>> Hopefully I've given you a bit of the flavor of the issues
involved.
>>> It's basically just a problem of sitting down and thinking
hard, finding
>>> something cleanly-implementable that doesn't break backwards
compatibility,
>>> and checking with the community that the syntax is agreeable and
that any
>>> forwards-compatibility break is ok.
>>>
>>> -- Sean Silva
>>>
>>>
>>> On Tue, Nov 26, 2013 at 8:02 PM, Mikael Lyngvig <mikael at
lyngvig.org>wrote:
>>>
>>>> The language reference states that local temporaries begin with
index
>>>> 0, but if I try that on my not-entirely-up-to-date v3.4 llc (it
is like a
>>>> week old), I get an error "instruction expected to be
numbered '%1'".
>>>>
>>>> Also, quite a few examples in the LR uses %0 as a local
identifier.
>>>>
>>>> Should I fix those or is it a problem in llc?
>>>>
>>>>
>>>> -- Mikael
>>>>
>>>>
>>>> _______________________________________________
>>>> LLVM Developers mailing list
>>>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>>
>>>>
>>>
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20131127/01206dc0/attachment.html>

Sean Silva

2013-Nov-27 05:24 UTC

head link

[LLVMdev] Bug in Language Reference? %0 versus %1 as starting index.

On Tue, Nov 26, 2013 at 10:35 PM, Mikael Lyngvig <mikael at lyngvig.org>
wrote:
> Without ANY intent of offending anybody, I simply don't like C++.  I
did
> code in it for some 12 years back from 1990 to 2002, but then I left it
> behind with a feeling of happiness.  The main reason I am _trying_ to make
> a new language is that I hope to one day come up with something that can
> help retiring C++.  I love C# but that language is yet too slow for many
> demanding problem domains.
>
C++ is far from perfect, but it's pretty amazing (especially with C++11);
I'm frequently shocked by how cleanly things can be implemented. If you
haven't been in contact with the language for a decade+, you may want to
give it another shot. Also, the feedback from every code patch you get
reviewed will move you exponentially closer towards an up-to-date working
knowledge of the language.

It sounds like a large portion of your time with C++ was spent in the
pre-STL days. Personally, grokking the STL is the single biggest thing that
ever happened to me as a programmer, and is the reason that I stick with
C++ (AFAIK there is no other mainstream language that can even model the
STL). If you never had the chance to grok the STL, then I would say
*definitely* give C++ another shot.

-- Sean Silva

>
> That being said, I don't seriously believe I'll ever finish up my
own
> language, but as long as I am having a good time along the way, I don't
> mind.  Now I spend the majority of my spare time on LLVM documentation
> (most of it still pending submission because of various factors).  Once the
> dust settles from all the documentation projects I've started on (Arch
> Linux build doc, Debian build doc, Windows build doc, Mapping High-Level
> Constructs to LLVM IR), I plan to resume work on my own language, which
> will be something like Python-syntax C# without .NET and perhaps with
> optional garbage collection.
>
> Perhaps I'll some day gather up the courage to pick an easy bug report
and
> fix that, but it is not very likely that I ever become a core coder on
LLVM.
>
>
> -- Mikael
>
>
> 2013/11/27 Sean Silva <chisophugis at gmail.com>
>
>>
>>
>>
>> On Tue, Nov 26, 2013 at 9:58 PM, Mikael Lyngvig <mikael at
lyngvig.org>wrote:
>>
>>> Thanks for the lecture :)  But I was not planning on changing a
single
>>> line in LLVM/Clang.  I stick to the documentation until I've
learned to
>>> swim, perhaps even forever.  Ah, now I see.  You thought I meant
"should I
>>> modify the code to do this or that."  I only meant to change
the
>>> documentation.  Please refer to the patch I've sent on
LLVM-commits.
>>>  That's about what I had in mind.  I am fully aware that you
cannot simply
>>> dive in and hack away on the handling of the %0 temporary.  I
wouldn't ever
>>> dream of doing that!
>>>
>>
>> You should dream of doing that. Nobody else has stepped up to do it.
Hack
>> on the code; ultimately that's where the action is and where you
will gain
>> understanding.
>> (And I'm probably the worst person to give this advice since I do
so
>> little code hacking during the school year. I swear, I really do prefer
>> coding; when I'm at work with a nice fast machine it's a lot
nicer to hack,
>> but at school with a crappy machine, the situation usually only permits
>> reviewing patches on the mailing lists or docs changes.)
>>
>> AFAIK nobody is an "expert" in that code (its probably long
out of core
>> for even the people that wrote it); if you dive into it, you can become
a
>> local expert in it.
>>
>>  -- Sean Silva
>>
>>
>>>
>>>
>>> -- Mikael
>>>
>>>
>>>
>>>
>>> 2013/11/27 Sean Silva <chisophugis at gmail.com>
>>>
>>>> (gah, this turned into a huge digression, sorry)
>>>>
>>>> The implicit numbering of BB's seems to be a pretty
frequent issue for
>>>> people. Surprisingly, the issue boils down to simply changing
the IR asm
>>>> (.ll file) syntax so that it can have "unnamed
BB's" in a recognizable way
>>>> that fits in with how unnamed values work (the asmprinter makes
an effort
>>>> to print a comment with the BB number, but the connection is
hard to see
>>>> and it's confusing).
>>>>
>>>> The thing that makes this not-as-easy-as-it-looks is doing it
in a way
>>>> that preserves compatibility with previous IR (and being able
to convince
>>>> yourself that this is the case), and the fact that the
IR-parsing code is a
>>>> bit twisty (it's not bad, but the way that some things work
is subtly
>>>> different from what you would expect) and you have to find
something that
>>>> "fits well" with what's there, doesn't
require major reworking of the
>>>> existing code, etc.
>>>>
>>>> An alternative approach is to document very clearly this issue.
That
>>>> might be good in the short term, but IMO the time would be
better spent
>>>> ruminating over a way to fit this into the syntax, and thinking
>>>> deeply/finding a way to convince yourself and others that this
change
>>>> doesn't break previous .ll files.
>>>>
>>>> It's just about thinking and coming up with a new syntax
that fits well
>>>> and that won't break existing .ll files. The key places for
making this
>>>> round-trip are AssemblyWriter::printBasicBlock in
lib/IR/AsmWriter.cpp
>>>> and LLParser::ParseBasicBlock in lib/AsmParser/LLParser.cpp.
The parsing
>>>> side is likely to be entirely in lib/AsmParser/LLLexer.cpp
where you need
>>>> to find a way to get a new token "LocalLabelID"
returned for the new syntax.
>>>>
>>>> To reiterate, the goal of such a change is solely to avoid
people
>>>> getting confused about the implicit numbering. It needs to be
>>>> reminiscent/suggestive of the instruction numbering syntax to
avoid this
>>>> confusion.
>>>>
>>>> Heck, there may be something within the existing syntax that
would work
>>>> fine for this, but which we can recognize as being
"unnamed", rather than a
>>>> unique name e.g. currently $1: will give the BB a name
"$1" (in the sense
>>>> of getName()), and then "$2:" will give a name
"$2", etc., which will cause
>>>> a lot of pointless string allocations; recognizing a decimal
number here
>>>> might be all that's needed (and updating the outputting
code accordingly),
>>>> although I'm not sure a prefix $ is the best syntax.
>>>>
>>>> Maybe we could even get away with %42: as a BB label and that
would be
>>>> maximally reminiscent. The way that numbered local variables
are handled is
>>>> sort of ad-hoc (it is actually also handled in the Lexer; all
the parser
>>>> sees is lltok::LocalVarID). By just changing
LLLexer::LexPercent in
>>>> LLLexer.cpp to recognize a local label and emit a
"LocalLabelID" token,
>>>> then adding an `else if` to the first `if` in
LLParser::ParseBasicBlock,
>>>> you could probably get a working solution too. However, this
introduces an
>>>> inconsistency in that now there's this pseudo-common syntax
(%[0-9]+) for
>>>> unnamed things for both BB's and instructions, but in the
case of
>>>> instructions, the % sigil is always needed, while the label
syntax isn't
>>>> sigilized by default, but permits this weird sigilized
temporary numbered
>>>> form. Maybe that slight inconsistency is worth it? If the
inconsistency is
>>>> really bothersome, we could also have BB's be able to start
sigilized with
>>>> % in the other case like instructions are (there is no
ambiguity because of
>>>> the trailing `:`), but allow the unsigilized versions for
compatibility;
>>>> this may be more consistent from a semantic perspective too,
since we refer
>>>> to them sigilized when used as instruction operands.
>>>>
>>>> Or maybe you could have the BB be numbered just like `42:`
without the
>>>> sigil. We already lex a label like 42:, but we just have the
issue that I
>>>> mentioned with $1: that we set this string as the getName()
value which
>>>> creates a bunch of useless strings. If you just change the code
to emit a
>>>> "LocalLabelID" for this case and imitate how we
handle locally numbered
>>>> instructions, that could be a pretty clean fix. However, that
would change
>>>> the behavior for how we handle a label like `0:`, for example,
with this
>>>> behavior, the following IR asm would work:
>>>>
>>>> define void @foo() {
>>>> 0:
>>>>   %1 = alloca i8*
>>>>   ret void
>>>> }
>>>>
>>>> but since with our current behavior we handle `0:` as a BB name
and set
>>>> it's getName() as "0", which causes it to not
take up the first unnamed
>>>> value slot (the %0'th one), so then you get an error that
%1 should be %0.
>>>> This may be an annoying forwards-compatibility issue for a
while when we
>>>> still have to work with not-trunk LLVM's, and this
incompatibility may not
>>>> be worth it. Actually all the suggestions that I've made so
far have this
>>>> same issue :/ Actually I think that it is unsolvable without a
>>>> forwards-compatibility break due to this (any label that was
previously
>>>> accepted would not increment the unnamed local counter, which
would cause
>>>> all the existing unnamed locals to be off by one and cause an
error). We do
>>>> break forward-compatibility from time to time (e.g. the syntax
for the new
>>>> attributes system), so it might not be that big of an issue
(although
>>>> obviously the community will have to decide about the trade-off
for a
>>>> temporary nuisance vs. the issue this solves). If breaking
>>>> forwards-compatibility is OK, then I would strongly suggest the
`0:` syntax
>>>> or `%0:`.
>>>>
>>>> Hopefully I've given you a bit of the flavor of the issues
involved.
>>>> It's basically just a problem of sitting down and thinking
hard, finding
>>>> something cleanly-implementable that doesn't break
backwards compatibility,
>>>> and checking with the community that the syntax is agreeable
and that any
>>>> forwards-compatibility break is ok.
>>>>
>>>> -- Sean Silva
>>>>
>>>>
>>>> On Tue, Nov 26, 2013 at 8:02 PM, Mikael Lyngvig <mikael at
lyngvig.org>wrote:
>>>>
>>>>> The language reference states that local temporaries begin
with index
>>>>> 0, but if I try that on my not-entirely-up-to-date v3.4 llc
(it is like a
>>>>> week old), I get an error "instruction expected to be
numbered '%1'".
>>>>>
>>>>> Also, quite a few examples in the LR uses %0 as a local
identifier.
>>>>>
>>>>> Should I fix those or is it a problem in llc?
>>>>>
>>>>>
>>>>> -- Mikael
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> LLVM Developers mailing list
>>>>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>>>
>>>>>
>>>>
>>>
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20131127/5d7e7f48/attachment.html>

Mikael Lyngvig

2013-Nov-27 07:04 UTC

head link

[LLVMdev] Bug in Language Reference? %0 versus %1 as starting index.

You have got a point.  I did code in C++ before the STL became common.  In
fact, most C++ coder's initial take on STL was that it was too expensive
(in terms of clocks) and too bulky.

I don't know what the status is of the language now, but I fear that it is
still in a sort of everlasting flux.  I got so very tired to the core of my
being because C++ was constantly evolving so that you never knew what
features you could rely on.  When I last seriously used C++, you couldn't
even be sure that your favorite compiler supported bool.  Some did, some
didn't.

On the other hand, I have to recommend to you that you set aside a few
hours to play with C#.  It is sort of the C syntax superimposed upon the
Pascal language, with some features stripped and OOP added.  Anders
Hejlsberg, the guy behind C# and .NET, did Borland Turbo Pascal and
PolyPascal/Compass Pascal before that.  C# is a very laid-back and
easy-going language, where I perceive C++ as a rather harsh and demanding
language.  C# is a very well thought out language, where I personally find
that C++ is more of a trial-and-error evolution of concepts.  I suspect
that the only thing that C++ perhaps has added to the notion of programming
languages is the concept of friend classes and friend operators and I
suspect that these originate elsewhere.

I have tried a couple of times to write my compiler in C++ (primarily to be
able to use the LLVM native C++ interface, but also to be able to do an
interpreter or JIT'ing).  And both times I finally got fed up with C++ and
decided to abandon that path altogether.  A thing that drives me crazy is
that you don't have Ada's and C#'s clean membership operators (dot,
every
time), but still suffer from C's variants: ., ->, and in C++ even things
like ->*.  That tires me :-)  I usually try to keep everything pointers
only so that I don't have to switch forth and back between writing . and
->, but that obviously has some impact on performance because you need to
allocate stuff on the heap that could have gone on the stack. The fact that
you have to declare your class members publicly also drives me crazy -
because it is so very hard to avoid exposing underlying operating system
APIs and libraries like LLVM.  My relationship to C++ can best be seen from
the fact that I have decided to write the bootstrap compiler in C#,
generate LLVM IR, and then invoke Clang from it.  I don't know what I will
be doing in the next iteration of the compiler, but I'll figure something
out.  Perhaps just stick with the LLVM IR->Clang method.

Personally, I think that a well-designed language must be designed with LTO
in mind.  Ada was such a language, even despite of its many flaws, and I
know for sure that my language will be such a language.  This has the added
effect that you do not need to expose any implementation details in the
public interface; the LTO can quietly allocate things on the stack when
they are only used locally and the rest goes on the heap.  I don't think
you can do that sort of tricks with any implementation of C++ because
everything in the C++ universe is so C-ishly bound at compile time.

What I should do is to finish up my language and then begin the enormous
project of translating LLVM from C++ to that language.  Now, that would be
interesting!  I'd learn so very much from the project and my own language
would grow rapidly as a result of it being used for something.  But I think
I am dreaming here.

But I'll take your points into consideration.  Because there's no doubt
that I would benefit from learning the STL.  If for nothing else, then to
be able to do something better than that.


-- Mikael


2013/11/27 Sean Silva <chisophugis at gmail.com>
>
>
>
> On Tue, Nov 26, 2013 at 10:35 PM, Mikael Lyngvig <mikael at
lyngvig.org>wrote:
>
>> Without ANY intent of offending anybody, I simply don't like C++. 
I did
>> code in it for some 12 years back from 1990 to 2002, but then I left it
>> behind with a feeling of happiness.  The main reason I am _trying_ to
make
>> a new language is that I hope to one day come up with something that
can
>> help retiring C++.  I love C# but that language is yet too slow for
many
>> demanding problem domains.
>>
>
> C++ is far from perfect, but it's pretty amazing (especially with
C++11);
> I'm frequently shocked by how cleanly things can be implemented. If you
> haven't been in contact with the language for a decade+, you may want
to
> give it another shot. Also, the feedback from every code patch you get
> reviewed will move you exponentially closer towards an up-to-date working
> knowledge of the language.
>
> It sounds like a large portion of your time with C++ was spent in the
> pre-STL days. Personally, grokking the STL is the single biggest thing that
> ever happened to me as a programmer, and is the reason that I stick with
> C++ (AFAIK there is no other mainstream language that can even model the
> STL). If you never had the chance to grok the STL, then I would say
> *definitely* give C++ another shot.
>
> -- Sean Silva
>
>
>>
>> That being said, I don't seriously believe I'll ever finish up
my own
>> language, but as long as I am having a good time along the way, I
don't
>> mind.  Now I spend the majority of my spare time on LLVM documentation
>> (most of it still pending submission because of various factors).  Once
the
>> dust settles from all the documentation projects I've started on
(Arch
>> Linux build doc, Debian build doc, Windows build doc, Mapping
High-Level
>> Constructs to LLVM IR), I plan to resume work on my own language, which
>> will be something like Python-syntax C# without .NET and perhaps with
>> optional garbage collection.
>>
>> Perhaps I'll some day gather up the courage to pick an easy bug
report
>> and fix that, but it is not very likely that I ever become a core coder
on
>> LLVM.
>>
>>
>> -- Mikael
>>
>>
>> 2013/11/27 Sean Silva <chisophugis at gmail.com>
>>
>>>
>>>
>>>
>>> On Tue, Nov 26, 2013 at 9:58 PM, Mikael Lyngvig <mikael at
lyngvig.org>wrote:
>>>
>>>> Thanks for the lecture :)  But I was not planning on changing a
single
>>>> line in LLVM/Clang.  I stick to the documentation until
I've learned to
>>>> swim, perhaps even forever.  Ah, now I see.  You thought I
meant "should I
>>>> modify the code to do this or that."  I only meant to
change the
>>>> documentation.  Please refer to the patch I've sent on
LLVM-commits.
>>>>  That's about what I had in mind.  I am fully aware that
you cannot simply
>>>> dive in and hack away on the handling of the %0 temporary.  I
wouldn't ever
>>>> dream of doing that!
>>>>
>>>
>>> You should dream of doing that. Nobody else has stepped up to do
it.
>>> Hack on the code; ultimately that's where the action is and
where you will
>>> gain understanding.
>>> (And I'm probably the worst person to give this advice since I
do so
>>> little code hacking during the school year. I swear, I really do
prefer
>>> coding; when I'm at work with a nice fast machine it's a
lot nicer to hack,
>>> but at school with a crappy machine, the situation usually only
permits
>>> reviewing patches on the mailing lists or docs changes.)
>>>
>>> AFAIK nobody is an "expert" in that code (its probably
long out of core
>>> for even the people that wrote it); if you dive into it, you can
become a
>>> local expert in it.
>>>
>>>  -- Sean Silva
>>>
>>>
>>>>
>>>>
>>>> -- Mikael
>>>>
>>>>
>>>>
>>>>
>>>> 2013/11/27 Sean Silva <chisophugis at gmail.com>
>>>>
>>>>> (gah, this turned into a huge digression, sorry)
>>>>>
>>>>> The implicit numbering of BB's seems to be a pretty
frequent issue for
>>>>> people. Surprisingly, the issue boils down to simply
changing the IR asm
>>>>> (.ll file) syntax so that it can have "unnamed
BB's" in a recognizable way
>>>>> that fits in with how unnamed values work (the asmprinter
makes an effort
>>>>> to print a comment with the BB number, but the connection
is hard to see
>>>>> and it's confusing).
>>>>>
>>>>> The thing that makes this not-as-easy-as-it-looks is doing
it in a way
>>>>> that preserves compatibility with previous IR (and being
able to convince
>>>>> yourself that this is the case), and the fact that the
IR-parsing code is a
>>>>> bit twisty (it's not bad, but the way that some things
work is subtly
>>>>> different from what you would expect) and you have to find
something that
>>>>> "fits well" with what's there, doesn't
require major reworking of the
>>>>> existing code, etc.
>>>>>
>>>>> An alternative approach is to document very clearly this
issue. That
>>>>> might be good in the short term, but IMO the time would be
better spent
>>>>> ruminating over a way to fit this into the syntax, and
thinking
>>>>> deeply/finding a way to convince yourself and others that
this change
>>>>> doesn't break previous .ll files.
>>>>>
>>>>> It's just about thinking and coming up with a new
syntax that fits
>>>>> well and that won't break existing .ll files. The key
places for making
>>>>> this round-trip are AssemblyWriter::printBasicBlock in
lib/IR/AsmWriter.cpp
>>>>> and LLParser::ParseBasicBlock in
lib/AsmParser/LLParser.cpp. The parsing
>>>>> side is likely to be entirely in lib/AsmParser/LLLexer.cpp
where you need
>>>>> to find a way to get a new token "LocalLabelID"
returned for the new syntax.
>>>>>
>>>>> To reiterate, the goal of such a change is solely to avoid
people
>>>>> getting confused about the implicit numbering. It needs to
be
>>>>> reminiscent/suggestive of the instruction numbering syntax
to avoid this
>>>>> confusion.
>>>>>
>>>>> Heck, there may be something within the existing syntax
that would
>>>>> work fine for this, but which we can recognize as being
"unnamed", rather
>>>>> than a unique name e.g. currently $1: will give the BB a
name "$1" (in the
>>>>> sense of getName()), and then "$2:" will give a
name "$2", etc., which will
>>>>> cause a lot of pointless string allocations; recognizing a
decimal number
>>>>> here might be all that's needed (and updating the
outputting code
>>>>> accordingly), although I'm not sure a prefix $ is the
best syntax.
>>>>>
>>>>> Maybe we could even get away with %42: as a BB label and
that would be
>>>>> maximally reminiscent. The way that numbered local
variables are handled is
>>>>> sort of ad-hoc (it is actually also handled in the Lexer;
all the parser
>>>>> sees is lltok::LocalVarID). By just changing
LLLexer::LexPercent in
>>>>> LLLexer.cpp to recognize a local label and emit a
"LocalLabelID" token,
>>>>> then adding an `else if` to the first `if` in
LLParser::ParseBasicBlock,
>>>>> you could probably get a working solution too. However,
this introduces an
>>>>> inconsistency in that now there's this pseudo-common
syntax (%[0-9]+) for
>>>>> unnamed things for both BB's and instructions, but in
the case of
>>>>> instructions, the % sigil is always needed, while the label
syntax isn't
>>>>> sigilized by default, but permits this weird sigilized
temporary numbered
>>>>> form. Maybe that slight inconsistency is worth it? If the
inconsistency is
>>>>> really bothersome, we could also have BB's be able to
start sigilized with
>>>>> % in the other case like instructions are (there is no
ambiguity because of
>>>>> the trailing `:`), but allow the unsigilized versions for
compatibility;
>>>>> this may be more consistent from a semantic perspective
too, since we refer
>>>>> to them sigilized when used as instruction operands.
>>>>>
>>>>> Or maybe you could have the BB be numbered just like `42:`
without the
>>>>> sigil. We already lex a label like 42:, but we just have
the issue that I
>>>>> mentioned with $1: that we set this string as the getName()
value which
>>>>> creates a bunch of useless strings. If you just change the
code to emit a
>>>>> "LocalLabelID" for this case and imitate how we
handle locally numbered
>>>>> instructions, that could be a pretty clean fix. However,
that would change
>>>>> the behavior for how we handle a label like `0:`, for
example, with this
>>>>> behavior, the following IR asm would work:
>>>>>
>>>>> define void @foo() {
>>>>> 0:
>>>>>   %1 = alloca i8*
>>>>>   ret void
>>>>> }
>>>>>
>>>>> but since with our current behavior we handle `0:` as a BB
name and
>>>>> set it's getName() as "0", which causes it to
not take up the first unnamed
>>>>> value slot (the %0'th one), so then you get an error
that %1 should be %0.
>>>>> This may be an annoying forwards-compatibility issue for a
while when we
>>>>> still have to work with not-trunk LLVM's, and this
incompatibility may not
>>>>> be worth it. Actually all the suggestions that I've
made so far have this
>>>>> same issue :/ Actually I think that it is unsolvable
without a
>>>>> forwards-compatibility break due to this (any label that
was previously
>>>>> accepted would not increment the unnamed local counter,
which would cause
>>>>> all the existing unnamed locals to be off by one and cause
an error). We do
>>>>> break forward-compatibility from time to time (e.g. the
syntax for the new
>>>>> attributes system), so it might not be that big of an issue
(although
>>>>> obviously the community will have to decide about the
trade-off for a
>>>>> temporary nuisance vs. the issue this solves). If breaking
>>>>> forwards-compatibility is OK, then I would strongly suggest
the `0:` syntax
>>>>> or `%0:`.
>>>>>
>>>>> Hopefully I've given you a bit of the flavor of the
issues involved.
>>>>> It's basically just a problem of sitting down and
thinking hard, finding
>>>>> something cleanly-implementable that doesn't break
backwards compatibility,
>>>>> and checking with the community that the syntax is
agreeable and that any
>>>>> forwards-compatibility break is ok.
>>>>>
>>>>> -- Sean Silva
>>>>>
>>>>>
>>>>> On Tue, Nov 26, 2013 at 8:02 PM, Mikael Lyngvig <mikael
at lyngvig.org>wrote:
>>>>>
>>>>>> The language reference states that local temporaries
begin with index
>>>>>> 0, but if I try that on my not-entirely-up-to-date v3.4
llc (it is like a
>>>>>> week old), I get an error "instruction expected to
be numbered '%1'".
>>>>>>
>>>>>> Also, quite a few examples in the LR uses %0 as a local
identifier.
>>>>>>
>>>>>> Should I fix those or is it a problem in llc?
>>>>>>
>>>>>>
>>>>>> -- Mikael
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> LLVM Developers mailing list
>>>>>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20131127/687092ba/attachment.html>

Possibly Parallel Threads

Search for more apparently analagous threads

llvm dev - Nov 2013 - [LLVMdev] Bug in Language Reference? %0 versus %1 as starting index.

[LLVMdev] Bug in Language Reference? %0 versus %1 as starting index.

[LLVMdev] Bug in Language Reference? %0 versus %1 as starting index.

[LLVMdev] Bug in Language Reference? %0 versus %1 as starting index.

Possibly Parallel Threads