John Reagan via llvm-dev
2016-Nov-01 14:14 UTC
[llvm-dev] What was the IR made for precisely?
> Date: Tue, 1 Nov 2016 11:31:05 +0000 > From: David Chisnall via llvm-dev <llvm-dev at lists.llvm.org> > Subject: Re: [llvm-dev] What was the IR made for precisely? > > On 28 Oct 2016, at 21:25, Hal Finkel <hfinkel at anl.gov> wrote: > > > > ----- Original Message ----- > >> From: "Chris Lattner via llvm-dev" <llvm-dev at lists.llvm.org> > >> To: "David Chisnall" <David.Chisnall at cl.cam.ac.uk> > >> Cc: llvm-dev at lists.llvm.org, "ジョウェットジェームス" > <b3i4zz1gu1 at docomo.ne.jp> > >> Sent: Friday, October 28, 2016 2:13:06 PM > >> Subject: Re: [llvm-dev] What was the IR made for precisely? > >> > >> > >>> On Oct 28, 2016, at 1:21 AM, David Chisnall > >>> <David.Chisnall at cl.cam.ac.uk> wrote: > >>> > >>> On 28 Oct 2016, at 02:43, ジョウェットジェームス > <b3i4zz1gu1 at docomo.ne.jp> > >>> wrote: > >>>> > >>>> I would need to sum up all the rules and ABIs and sizes for all > the > >>>> targets I need and generate different IR for each, am I correct? > >>> > >>> This is a long-known limitation of LLVM IR and there are a lot of > >>> proposals to fix it. It would be great if the LLVM Foundation > would > >>> fund someone to do the work, as it isn’t a sufficiently high > >>> priority for any of the large LLVM consumers and would make a huge > >>> difference to the utility of LLVM for a lot of people. > >> … > >>> I think it would be difficult to do it within the timescale of the > >>> GSoC unless the student was already an experienced LLVM developer. > >>> It would likely involve designing some good APIs (difficult!), > >>> refactoring a bunch of Clang code, and creating a new LLVM library. > >>> I’ve not seen a GSoC project on this scale succeed in any of the > >>> open source projects that I’ve been involved with. If we had a > good > >>> design doc and a couple of engaged mentors then it might stand a > >>> chance. > >> > >> Is there a specific design that you think would work? One of the > >> major problems with this sort of proposal is that you need the > entire > >> clang type system to do this, which means it depends on a huge chunk > >> of the Clang AST. At that point, this isn’t a small library that > >> clang uses, this is a library layered on top of Clang itself. > > > > Given that ABIs are defined in terms of C (and sometimes now C++) > language constructs, I think that something like this is the best of > all bad options. Really, however, it depends only on the AST and > CodeGen, and maybe those (along with 'Basic', etc.) could be made into > a separately-compilable library. Along with an easy ASTBuilder for C > types and function declarations we should be able to satisfy this use > case. > > Indeed. Today, I can go and get the MIPS, ARM, x86-64, or whatever ABI > specification and it defines how all of the C types map to in-memory > types and where arguments go. We currently have no standard for how > any of this is represented in IR, and I have to look at what clang > generates if I want to generate C-compatible IR (and this is not stable > over time - the contract between clang and the x86 back end has changed > at least once that I remember). The minimum that you need to be able > to usefully interoperate with C is: > > - The ability to map each of the C types (int, long, float, double) to > the corresponding LLVM type. > > - The ability to generate an LLVM struct that corresponds to a > particular C struct (including loads and stores from struct members) > > - The ability to construct functions that have a C API signature and > call functions that have such a signature. > > We’ve discussed possible APIs for this in the Cambridge LLVM Socials a > couple of times. I think that the best proposal was along the > following lines: > > A new CABIBuilder that handles constructing C ABI elements. This would > have the primitive C types as static fields and would allow you to > construct a C struct type by passing C types (primitives or other > structs, optionally with array sizes). From this it would construct an > LLVM struct and provide IRBuilder-like methods for constructing GEPs to > specific fields (and probably loads and stores to bitfields). > > The same approach would be used for functions and calls. Once you’ve > built the CFunctionType from C structs and primitives for arguments, > you would have an analogue of IRBuilder’s CreateCall / CreateInvoke > that would take the IR types that correspond to the C types and marshal > them correctly. > > On the other side of the call (constructing a C ABI function by passing > a set of C types to the builder), you’d get an LLVM Function that took > the arguments in whatever LLVM expects and then stores them into > Allocas, which would be returned to the callee, so the front-end author > would never need to look at the explicit parameters. > > You’d need a small subset of Clang’s AST for this (none of the stuff > for builtins, nothing for C++ / Objective-C, and so on) and several of > the bits of CodeGen (in particular, CGTargetInfo contains a bunch of > stuff that really should be in LLVM, for example with respect to > variadics). It’s a big bit of refactoring work, and a lot of it would > probably need to end up duplicated in both clang and LLVM (though it > should be easy to automate the testing). > > Another alternative is to expose these APIs via from Clang itself, so > if you need them then you will have to link clang’s Basic, AST and > CodeGen libraries (which is only about 10MB in a release build and > could be dynamically linked if they’re used by multiple things). This > approach would also make it easier to extend the interfaces to allow > header parsing and C++ interop (which would be nice for using things > like shared_ptr across FFI boundaries). > > David >I'd prefer not to expose them via Clang, that would just make another dependency for those of use generating our own IR directly to LLVM without having Clang. Right now, we have a converter from our own backend IR to LLVM IR so we can port our entire compiler suite of BASIC, COBOL, Pascal, Fortran, C. We've had to peek at what Clang generates to figure out the correct mapping for us. Even with some CABIBuilder, I think we might have to go deeper for us. For example, we aren't using IRBuilder at the moment since we end up building the IR in pieces and stitching it all back together at the end of the conversion process.