thr3ads.net - llvm dev - [LLVMdev] "Mapping High-Level Constructs to LLVM IR" [Nov 2013]

If this information is useful, please help other people find it:
Share via:

Mikael Lyngvig

2013-Nov-23 03:25 UTC

[LLVMdev] "Mapping High-Level Constructs to LLVM IR"

Hi guys,

I have begun writing on a new document, named "Mapping High-Level
Constructs to LLVM IR", in which I hope to eventually explain how to map
pretty much every contemporary high-level imperative and/or OOP language
construct to LLVM IR.

I write it for two reasons:

1. I need to know this stuff myself to be able to continue on my own
language project.
2. I feel that this needs to be documented once and for all, to save tons
of time for everybody out there, especially for the language inventors who
just want to use LLVM as a backend.

So my plan is to write this document and continue to revise and enhance it
as I understand more and helpful people on the list and elsewhere explain
to me how these things are done.

Basically, I just want to know if there is any interest in such a document
or if I should put it on my own website.  If you know of any books or
articles that already do this, then please let me know about them.

I've attached the result of 30 minutes work, just so that you can see what
I mean.  Please don't review the document as it is still in its very early
infancy.


Cheers,
Mikael
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20131123/616033e8/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: MappingHighLevelConstructsToLLVMIR.rst
Type: application/octet-stream
Size: 7010 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20131123/616033e8/attachment.obj>

Sean Silva

2013-Nov-23 03:45 UTC

head link

[LLVMdev] "Mapping High-Level Constructs to LLVM IR"

On Fri, Nov 22, 2013 at 10:25 PM, Mikael Lyngvig <mikael at lyngvig.org>
wrote:
> Hi guys,
>
> I have begun writing on a new document, named "Mapping High-Level
> Constructs to LLVM IR", in which I hope to eventually explain how to
map
> pretty much every contemporary high-level imperative and/or OOP language
> construct to LLVM IR.
>
> I write it for two reasons:
>
> 1. I need to know this stuff myself to be able to continue on my own
> language project.
> 2. I feel that this needs to be documented once and for all, to save tons
> of time for everybody out there, especially for the language inventors who
> just want to use LLVM as a backend.
>
We get questions like "how do I implement a string type in llvm"
frequently, so something like this is probably useful.

>
> So my plan is to write this document and continue to revise and enhance it
> as I understand more and helpful people on the list and elsewhere explain
> to me how these things are done.
>
> Basically, I just want to know if there is any interest in such a document
> or if I should put it on my own website.  If you know of any books or
> articles that already do this, then please let me know about them.
>
> I've attached the result of 30 minutes work, just so that you can see
what
> I mean.  Please don't review the document as it is still in its very
early
> infancy.
>
I feel like the "lowering it to C" part is part of the typical
"low level
curriculum" that is unfortunately not taught anywhere really, but I feel is
common knowledge among ... I'm not sure who, but I'm pretty sure that
almost all LLVM developers picked it up somehow, somewhere (I honestly
don't know where I did...). I would try to investigate if there is an
alternative place where these things are discussed better, since I feel
like this is not very LLVM-specific knowledge.

For covering this sort of thing inside the LLVM docs, the best way I can
think to do so is to improve docs/tutorial/ to just add those features.

If you are implementing a language, a good topic for a document that you
can probably help with is the following:
LLVM doesn't provide a "complete portable runtime environment";
you still
need to know how to e.g. get access to malloc, or link with system
libraries for basic functionality, etc. What sorts of "glue" work like
the
above does a language implementor typically have to do in order to make a
runnable language?

-- Sean Silva



>
>
> Cheers,
> Mikael
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20131122/7cde3788/attachment.html>

Mikael Lyngvig

2013-Nov-23 04:16 UTC

head link

[LLVMdev] "Mapping High-Level Constructs to LLVM IR"

Yes, it is sort of scary that there seems to be no definite resource to
consult for information on especially advanced OOP things.  I have
something like 20 compiler books on my shelves and yet none of them mention
a single word on how to implement multiple inheritance, exception handling,
or interfaces.  It seems like they are all happy about presenting a subset
of Pascal, or Java, with an integer and boolean type, and then let the
reader figure out all the hard stuff by himself or herself.  I know I
picked most of my knowledge up from doing lots of debug info converters
back when C++ was just coming to market - the early C-only debuggers didn't
understand C++ so the quick solution was to convert C++ debug info into
lowered C debug info :-)  If you know of any compiler book that covers
advanced OOP topics, please feel free to let me know the title/ISBN of it.

I was thinking of making a tutorial sometime down the road when I have
myself grasped sufficient about LLVM.  For the time being, I think it makes
a lot of sense to first transform into C (a language known by most) and
then take it one level further by transforming into LLVM IR.  I cannot
promise that I will make such a tutorial, but the outcome of this
sub-project should definitely make it much easier to make such a tutorial.
 Personally, I think the lowering into C makes the difficult topics very
easy to understand: Once you've understood how to lower a C++ class into a
C structure with associated functions, almost all of the magic of the C++
class vanishes and you're set to go.

I envision LLVM as a sort of "compiler builders' power toolbox"
and along
with that vision goes a lot of great documentation that makes you fly in no
time.  To be honest, I personally feel that the learning curve is a bit
steep. From building using a non-Microsoft compiler (on Windows) over to
making a buildbot slave over to implementing an actual language using LLVM,
I think the path is rather difficult.  It may just be me, or it may also be
that you LLVM gurus have breathed LLVM for so long that you no longer
remember that not everybody getting in touch with LLVM actually wants to
code on LLVM.

I'll add a section on "How to Interface to the Operating System"
to the
article.  I'll focus on interfacing directly with the host operating system
as I personally dislike the C run-time library for a lot of reasons (ever
met a programmer who routinely checked errno? - I haven't, although I have
once or twice in my life seen Unix code that actually did bother to check
errno).

Perhaps it is better if I already now plan to release this article on my
own web site?  I don't want to impose it on you guys, but I definitely want
to do it.  I know there are more than 5.000 programming languages in the
world and if only half the developers were to adopt LLVM, things would
surely look much brighter with respect to the number of contributors of
builders, documentation, and actual code patches.

-- Mikael

2013/11/23 Sean Silva <silvas at purdue.edu>
>
>
>
> On Fri, Nov 22, 2013 at 10:25 PM, Mikael Lyngvig <mikael at
lyngvig.org>wrote:
>
>> Hi guys,
>>
>> I have begun writing on a new document, named "Mapping High-Level
>> Constructs to LLVM IR", in which I hope to eventually explain how
to map
>> pretty much every contemporary high-level imperative and/or OOP
language
>> construct to LLVM IR.
>>
>> I write it for two reasons:
>>
>> 1. I need to know this stuff myself to be able to continue on my own
>> language project.
>> 2. I feel that this needs to be documented once and for all, to save
tons
>> of time for everybody out there, especially for the language inventors
who
>> just want to use LLVM as a backend.
>>
>
> We get questions like "how do I implement a string type in llvm"
> frequently, so something like this is probably useful.
>
>
>>
>> So my plan is to write this document and continue to revise and enhance
>> it as I understand more and helpful people on the list and elsewhere
>> explain to me how these things are done.
>>
>> Basically, I just want to know if there is any interest in such a
>> document or if I should put it on my own website.  If you know of any
books
>> or articles that already do this, then please let me know about them.
>>
>> I've attached the result of 30 minutes work, just so that you can
see
>> what I mean.  Please don't review the document as it is still in
its very
>> early infancy.
>>
>
> I feel like the "lowering it to C" part is part of the typical
"low level
> curriculum" that is unfortunately not taught anywhere really, but I
feel is
> common knowledge among ... I'm not sure who, but I'm pretty sure
that
> almost all LLVM developers picked it up somehow, somewhere (I honestly
> don't know where I did...). I would try to investigate if there is an
> alternative place where these things are discussed better, since I feel
> like this is not very LLVM-specific knowledge.
>
> For covering this sort of thing inside the LLVM docs, the best way I can
> think to do so is to improve docs/tutorial/ to just add those features.
>
> If you are implementing a language, a good topic for a document that you
> can probably help with is the following:
> LLVM doesn't provide a "complete portable runtime
environment"; you still
> need to know how to e.g. get access to malloc, or link with system
> libraries for basic functionality, etc. What sorts of "glue" work
like the
> above does a language implementor typically have to do in order to make a
> runnable language?
>
> -- Sean Silva
>
>
>
>
>>
>>
>> Cheers,
>> Mikael
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20131123/5798ef0c/attachment.html>

Joshua Cranmer 🐧

2013-Nov-23 05:54 UTC

head link

[LLVMdev] "Mapping High-Level Constructs to LLVM IR"

On 11/22/2013 9:25 PM, Mikael Lyngvig wrote:> Hi guys,
>
> I have begun writing on a new document, named "Mapping High-Level 
> Constructs to LLVM IR", in which I hope to eventually explain how to 
> map pretty much every contemporary high-level imperative and/or OOP 
> language construct to LLVM IR.
>
> I write it for two reasons:
>
> 1. I need to know this stuff myself to be able to continue on my own 
> language project.
> 2. I feel that this needs to be documented once and for all, to save 
> tons of time for everybody out there, especially for the language 
> inventors who just want to use LLVM as a backend.
>
> So my plan is to write this document and continue to revise and 
> enhance it as I understand more and helpful people on the list and 
> elsewhere explain to me how these things are done.
>
> Basically, I just want to know if there is any interest in such a 
> document or if I should put it on my own website.  If you know of any 
> books or articles that already do this, then please let me know about 
> them.
>
> I've attached the result of 30 minutes work, just so that you can see 
> what I mean.  Please don't review the document as it is still in its 
> very early infancy.
There is a strong bias towards C++ in the document, which isn't a 
particularly strong slice of higher-level constructs. For example, C++'s 
RTTI constructs serve three distinct purposes: exception handling, 
dynamic casts, and reflection (although C++'s reflection capabilities 
are extremely weak). You'll need to talk about inheritance in the three 
cases: single, multiple, and virtual (to use C++'s terminology) (note 
that Java's interfaces can be implemented as virtual inheritance). 
Boxing is another important topic. Lambdas, closures, and generators 
(yield keyword) are becoming increasingly common in modern programming 
languages, and should not be ignored.

Finally, calling propagated return values "exception handling" does an
extreme disservice to your readers. LLVM IR explicitly models exception 
handling, and attempting to describe it lowered as return values is not 
how anyone should implement it. If you badly want to describe it in C 
terms, you could at least use C's setjmp/longjmp to describe it; the 
truth is, this is a feature which doesn't exist cleanly in C.

Trying to describe mapping higher-level languages to C and then C to IR 
is a poor idea. C is in some ways an extremely limited language (no 
native exception handling constructs, e.g.). If you want to be a guide 
to how to lower languages to LLVM IR, you need to also explain how to 
take advantage of features in the IR to optimize code better (e.g., 
TBAA). Cfront-like C++ compilers are extremely rare-to-nonexistent (in 
part because it is difficult to map some features, most notably 
exception handling, cleanly and efficiently into C); if your guide is 
describing such an approach, it reads like an implicit endorsement. It 
is possible to describe some aspects of the IR in C, but if the goal is 
to lower to IR, then the description should be lowering to IR, not 
lowering to C.

-- 
Joshua Cranmer
Thunderbird and DXR developer
Source code archæologist

Mikael Lyngvig

2013-Nov-23 06:18 UTC

head link

[LLVMdev] "Mapping High-Level Constructs to LLVM IR"

Thanks, you have a lot of valid points there.  I have myself long ago
abandoned the path of using C as a backend language due to the very factors
you mention.

However, as I said, the document was put together in 30 minutes.  Not
exactly ready for prime time :-)

I do agree that all of the things you mention should be described,
including Lambdas, closures, and generators, but I must admit up front that
I don't know how to implement half of them.  But I suppose I could do a lot
of research and perhaps occasionally ask you guys for specifics.

We are not going to find much common ground on the issue of "calling
propagated return values for exception handling", I think :-)  See
https://www.lyngvig.org/Teknik/A-Proposal-for-Exception-Handling-in-C for
the details.

I started out with C++ as the example language because a lot of people know
that language - and most certainly the majority of the LLVM user base.
 Obviously, you'd have to add source code from other languages than C++
when C++ does not provide features to illustrate the process.

I now agree that the lowering into C is not such a good idea after all.  So
I'll go straight from source language to LLVM IR, which is not that
difficult after all, and won't be very different for the reader.  In fact,
I think it will be much better than my original approach.

Thanks again for your valid objections.


-- Mikael




2013/11/23 Joshua Cranmer 🐧 <Pidgeot18 at gmail.com>
> On 11/22/2013 9:25 PM, Mikael Lyngvig wrote:
>
>> Hi guys,
>>
>> I have begun writing on a new document, named "Mapping High-Level
>> Constructs to LLVM IR", in which I hope to eventually explain how
to map
>> pretty much every contemporary high-level imperative and/or OOP
language
>> construct to LLVM IR.
>>
>> I write it for two reasons:
>>
>> 1. I need to know this stuff myself to be able to continue on my own
>> language project.
>> 2. I feel that this needs to be documented once and for all, to save
tons
>> of time for everybody out there, especially for the language inventors
who
>> just want to use LLVM as a backend.
>>
>> So my plan is to write this document and continue to revise and enhance
>> it as I understand more and helpful people on the list and elsewhere
>> explain to me how these things are done.
>>
>> Basically, I just want to know if there is any interest in such a
>> document or if I should put it on my own website.  If you know of any
books
>> or articles that already do this, then please let me know about them.
>>
>> I've attached the result of 30 minutes work, just so that you can
see
>> what I mean.  Please don't review the document as it is still in
its very
>> early infancy.
>>
>
> There is a strong bias towards C++ in the document, which isn't a
> particularly strong slice of higher-level constructs. For example,
C++'s
> RTTI constructs serve three distinct purposes: exception handling, dynamic
> casts, and reflection (although C++'s reflection capabilities are
extremely
> weak). You'll need to talk about inheritance in the three cases:
single,
> multiple, and virtual (to use C++'s terminology) (note that Java's
> interfaces can be implemented as virtual inheritance). Boxing is another
> important topic. Lambdas, closures, and generators (yield keyword) are
> becoming increasingly common in modern programming languages, and should
> not be ignored.
>
> Finally, calling propagated return values "exception handling"
does an
> extreme disservice to your readers. LLVM IR explicitly models exception
> handling, and attempting to describe it lowered as return values is not how
> anyone should implement it. If you badly want to describe it in C terms,
> you could at least use C's setjmp/longjmp to describe it; the truth is,
> this is a feature which doesn't exist cleanly in C.
>
> Trying to describe mapping higher-level languages to C and then C to IR is
> a poor idea. C is in some ways an extremely limited language (no native
> exception handling constructs, e.g.). If you want to be a guide to how to
> lower languages to LLVM IR, you need to also explain how to take advantage
> of features in the IR to optimize code better (e.g., TBAA). Cfront-like C++
> compilers are extremely rare-to-nonexistent (in part because it is
> difficult to map some features, most notably exception handling, cleanly
> and efficiently into C); if your guide is describing such an approach, it
> reads like an implicit endorsement. It is possible to describe some aspects
> of the IR in C, but if the goal is to lower to IR, then the description
> should be lowering to IR, not lowering to C.
>
> --
> Joshua Cranmer
> Thunderbird and DXR developer
> Source code archæologist
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20131123/ae08da36/attachment.html>

Reasonably Related Threads

Search for more possibly parallel threads

llvm dev - Nov 2013 - [LLVMdev] "Mapping High-Level Constructs to LLVM IR"

[LLVMdev] "Mapping High-Level Constructs to LLVM IR"

[LLVMdev] "Mapping High-Level Constructs to LLVM IR"

[LLVMdev] "Mapping High-Level Constructs to LLVM IR"

[LLVMdev] "Mapping High-Level Constructs to LLVM IR"

[LLVMdev] "Mapping High-Level Constructs to LLVM IR"

Reasonably Related Threads