On Tue, Nov 10, 2009 at 6:10 PM, Talin <viridia at gmail.com> wrote:> In my case, I've been attempting to build a target-neutral frontend. In my > tool chain, the target is specified at link time, not at compile time. Among > other things, that means that the same IR file can be used for multiple > targets.That's the direction I'm going in too.> > What strikes me is how tantalizingly close LLVM comes to being able to do > this. I am surprised, for example, that I can general all of the DWARF > debugging structures without ever having to choose a target machine. Most > things can be done quite easily without knowing the exact size of a pointer. > When it comes to being able to "generate once, run anywhere", LLVM is like > 99.5% of the way there. Which makes that last remaining .5% particularly > vexing. > > There's only a tiny handful of fairly esoteric cases which require selecting > a target before you generate IR. Unfortunately, the "pointer the same size > as an int" is one of these rare cases - it is something that is very > painful to try and work around. (A similar esoteric use case is: "which of > the following two types is larger, 3 x int32 or 2 x {}*? -- i.e. the union > problem.)I'm willing to spend some time on adding intp to LLVM... my front-end's standard libraries would be cleaner and more portable that way.
On Tue, Nov 10, 2009 at 6:41 PM, Kenneth Uildriks <kennethuil at gmail.com> wrote:> On Tue, Nov 10, 2009 at 6:10 PM, Talin <viridia at gmail.com> wrote:(A similar esoteric use case is: "which of>> the following two types is larger, 3 x int32 or 2 x {}*? -- i.e. the union >> problem.)The size of a union can be compiled into a ConstantExpr. i.e., (sizeof(T1) > sizeof(T2)) ? sizeof(T1) : sizeof(T2)) Since sizeof(T1) and sizeof(T2) themselves are ConstantExpr's, and so is icmp(ConstantExpr, ConstantExpr) and select (ConstantExpr, ConstantExpr, ConstantExpr). You won't be able to tell which is bigger from your front-end, but you'll have a ConstantExpr that you can feed to malloc, etc.
On Wed, Nov 11, 2009 at 7:00 AM, Kenneth Uildriks <kennethuil at gmail.com> wrote:> On Tue, Nov 10, 2009 at 6:41 PM, Kenneth Uildriks <kennethuil at gmail.com> wrote: >> On Tue, Nov 10, 2009 at 6:10 PM, Talin <viridia at gmail.com> wrote: > (A similar esoteric use case is: "which of >>> the following two types is larger, 3 x int32 or 2 x {}*? -- i.e. the union >>> problem.) > > The size of a union can be compiled into a ConstantExpr. i.e., > > (sizeof(T1) > sizeof(T2)) ? sizeof(T1) : sizeof(T2)) > > Since sizeof(T1) and sizeof(T2) themselves are ConstantExpr's, and so > is icmp(ConstantExpr, ConstantExpr) and select (ConstantExpr, > ConstantExpr, ConstantExpr). > > You won't be able to tell which is bigger from your front-end, but > you'll have a ConstantExpr that you can feed to malloc, etc. >Of course if you try to represent it as an aggregate, rather than a block of memory, you're stuck again, and for the same reason. An array type can't use a ConstantExpr for its size... it has to be specified as a literal integer by the front-end. So passing your union as a parameter or returning it by value won't work... unions can *only* live in memory unless you've got target data. Very interesting problem (but one I don't feel ready to even high-level-design a solution for yet)...
Kenneth Uildriks wrote:> On Tue, Nov 10, 2009 at 6:10 PM, Talin <viridia at gmail.com> wrote: >> In my case, I've been attempting to build a target-neutral frontend. In my >> tool chain, the target is specified at link time, not at compile time. Among >> other things, that means that the same IR file can be used for multiple >> targets. > > That's the direction I'm going in too. > >> What strikes me is how tantalizingly close LLVM comes to being able to do >> this. I am surprised, for example, that I can general all of the DWARF >> debugging structures without ever having to choose a target machine. Most >> things can be done quite easily without knowing the exact size of a pointer. >> When it comes to being able to "generate once, run anywhere", LLVM is like >> 99.5% of the way there. Which makes that last remaining .5% particularly >> vexing. >> >> There's only a tiny handful of fairly esoteric cases which require selecting >> a target before you generate IR. Unfortunately, the "pointer the same size >> as an int" is one of these rare cases - it is something that is very >> painful to try and work around. (A similar esoteric use case is: "which of >> the following two types is larger, 3 x int32 or 2 x {}*? -- i.e. the union >> problem.) > > I'm willing to spend some time on adding intp to LLVM... my > front-end's standard libraries would be cleaner and more portable that > way.Sorry, but I'm still opposed. From your description of 'intp' it sounds like it's a strict subset of pointers. You can't sext it, zext it or trunc it, like you can with integers. You can bitcast it, but only to another pointer. The use case you mentioned was that some native system APIs want integers that are the same size as pointers. So why not just declare those arguments or fields with a pointer type in LLVM? Then you've got a field with the right size. Nick
On Wed, Nov 11, 2009 at 2:10 PM, Nick Lewycky <nicholas at mxc.ca> wrote:> Kenneth Uildriks wrote: >> >> On Tue, Nov 10, 2009 at 6:10 PM, Talin <viridia at gmail.com> wrote: >>> >>> In my case, I've been attempting to build a target-neutral frontend. In >>> my >>> tool chain, the target is specified at link time, not at compile time. >>> Among >>> other things, that means that the same IR file can be used for multiple >>> targets. >> >> That's the direction I'm going in too. >> >>> What strikes me is how tantalizingly close LLVM comes to being able to do >>> this. I am surprised, for example, that I can general all of the DWARF >>> debugging structures without ever having to choose a target machine. Most >>> things can be done quite easily without knowing the exact size of a >>> pointer. >>> When it comes to being able to "generate once, run anywhere", LLVM is >>> like >>> 99.5% of the way there. Which makes that last remaining .5% particularly >>> vexing. >>> >>> There's only a tiny handful of fairly esoteric cases which require >>> selecting >>> a target before you generate IR. Unfortunately, the "pointer the same >>> size >>> as an int" is one of these rare cases - it is something that is very >>> painful to try and work around. (A similar esoteric use case is: "which >>> of >>> the following two types is larger, 3 x int32 or 2 x {}*? -- i.e. the >>> union >>> problem.) >> >> I'm willing to spend some time on adding intp to LLVM... my >> front-end's standard libraries would be cleaner and more portable that >> way. > > Sorry, but I'm still opposed. From your description of 'intp' it sounds like > it's a strict subset of pointers. You can't sext it, zext it or trunc it, > like you can with integers. You can bitcast it, but only to another pointer.You can do integer arithmetic & bitwise operations with it. You can convert it to other types of integers, although you wouldn't be able to tell whether you were truncating or zexting them at IR-generation time (at least not without target data). You can create literal values of intp type. You can, of course, safely convert it to/from a pointer. intp is an integer, not a pointer. It's sized the same as a pointer, so you can use it as a pointer offset, a size parameter, or something along those lines, without having to know how big a pointer.> > The use case you mentioned was that some native system APIs want integers > that are the same size as pointers. So why not just declare those arguments > or fields with a pointer type in LLVM? Then you've got a field with the > right size.But not necessarily the right alignment. Some platforms align pointers differently from ints.