thr3ads.net - llvm dev - [LLVMdev] dynamic typing system [Aug 2010]

If this information is useful, please help other people find it:
Share via:

Alec Benzer

2010-Aug-16 20:26 UTC

[LLVMdev] dynamic typing system

This isn't a strictly llvm-related problem, but I thought I'd ask anyway
to
see if anyone can help.

I'm trying to write a dynamically typed language on top of llvm. My initial
idea was to have a general object type for all objects in my language. I
came up with:

{ i8, i8* }

the first element of the structure would hold the type of the object, and
the second is a pointer to the actual data.

Now, I'm not exactly sure how to get my data allocated somewhere in order to
be able to get a pointer to it. My initial thought was heap allocations,
though there doesn't seem to be any llvm instructions that perform heap
allocations, though I imagine you just use C's malloc and free? If you do
use those, however, is there a way of getting the byte-size of a type, to
know what to pass to malloc? There's also the issue of having to know when
to be able to free() the pointers.

The other option, I guess, would be stack allocations with alloca
instructions? I don't need to worry about the sizes of types or about
calling free, but now my objects can't live on past the scope of a function,
which may complicate things. For instance, if at my jiting repl (set up like
the Kaleidoscope tutorial, where top-level expressions are wrapped in
lambdas and then executed), I type in "5", the repl should spit 5 back
to
me. If I use allocas here there isn't a problem. But if I define a global
variable and assign 5 to it, the data I alloca'd is going to be gone after
the anonymous function returns. This makes it seem like heap allocations
would be a better choice.

So basically, I'm sort of stuck not knowing the best way to implement this
(or which way will even be possible). I'd appreciate any input/guidance on
how to proceed.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20100816/9ed59ba4/attachment.html>

John Criswell

2010-Aug-16 20:53 UTC

head link

[LLVMdev] dynamic typing system

Alec Benzer wrote:> This isn't a strictly llvm-related problem, but I thought I'd ask 
> anyway to see if anyone can help.
>
> I'm trying to write a dynamically typed language on top of llvm. My 
> initial idea was to have a general object type for all objects in my 
> language. I came up with:
>
> { i8, i8* }
>
> the first element of the structure would hold the type of the object, 
> and the second is a pointer to the actual data.
>
> Now, I'm not exactly sure how to get my data allocated somewhere in 
> order to be able to get a pointer to it. My initial thought was heap 
> allocations, though there doesn't seem to be any llvm instructions 
> that perform heap allocations, though I imagine you just use C's 
> malloc and free? If you do use those, however, is there a way of 
> getting the byte-size of a type, to know what to pass to malloc? 
> There's also the issue of having to know when to be able to free() the 
> pointers.
For heap allocations, you need to insert a call to a function that 
performs heap allocations.  Creating a call to malloc is one 
possibility.  However, depending on your front-end language, you may 
wish to use a garbage-collection allocator (e.g., the Boehm conservative 
allocator) or a region-based allocator (e.g., the allocators used in the 
Automatic Pool Allocator and SAFECode projects or the kmem_cache_alloc() 
allocator used in Linux).  Whatever heap allocator you use is a design 
choice for your language.

To get the allocation size, use the TargetData analysis pass.  This pass 
has methods that can tell you the allocation size of various LLVM types.

Note that if you go with a garbage-collected heap allocator, you will 
probably need to do some additional work so that the GC knows where to 
find the roots to use for starting scans of the heap.  I believe LLVM 
provides intrinsics to aid with this.

-- John T.
>
> The other option, I guess, would be stack allocations with alloca 
> instructions? I don't need to worry about the sizes of types or about 
> calling free, but now my objects can't live on past the scope of a 
> function, which may complicate things. For instance, if at my jiting 
> repl (set up like the Kaleidoscope tutorial, where top-level 
> expressions are wrapped in lambdas and then executed), I type in
"5",
> the repl should spit 5 back to me. If I use allocas here there isn't a 
> problem. But if I define a global variable and assign 5 to it, the 
> data I alloca'd is going to be gone after the anonymous function 
> returns. This makes it seem like heap allocations would be a better 
> choice.
>
> So basically, I'm sort of stuck not knowing the best way to implement 
> this (or which way will even be possible). I'd appreciate any 
> input/guidance on how to proceed.

Joachim Durchholz

2010-Aug-16 21:17 UTC

head link

[LLVMdev] dynamic typing system

Am 16.08.2010 22:26, schrieb Alec Benzer:> The other option, I guess, would be stack allocations with alloca
> instructions? I don't need to worry about the sizes of types or about
> calling free, but now my objects can't live on past the scope of a
function,
> which may complicate things.
That's putting it mildly.
If locally created objects can outlive the stack frame, you need a heap. 
(Unless your language is specifically geared towards region analysis, 
but your post doesn't sound as if that's the part of the learning curve 
you're on.)
Some language infrastructures actually allocated everything on the heap. 
Stack-based allocation would be added as an optimization then. (This 
might be reasonable, or the worst idea ever, depending on specifics of 
the language.)

Regards,
Jo

Eugene Toder

2010-Aug-16 21:55 UTC

head link

[LLVMdev] dynamic typing system

If your data needs to outlive the function it was allocated in you
need to use heap allocation (either malloc or a custom allocation/gc
-- just call it as a C function for now, for some allocators you can
inline it directly into your code).
To calculate the size for allocation either use TargetData or use GEP
trick
(http://nondot.org/sabre/LLVMNotes/SizeOf-OffsetOf-VariableSizedStructs.txt).
Note that in your struct you don't gain much by declaring type as i8
-- pointer that follows it will be aligned at 4 or 8 bytes, thus
creating a hole after i8. Type is usually stored as a pointer to some
type info or method table. Dynamically typed languages usually put it
as the first word of every object and use "plain" pointers to objects.
Sometimes "fat" pointers (a pair of pointer to type and pointer to
data) are used.

Eugene

On Mon, Aug 16, 2010 at 9:26 PM, Alec Benzer <alecbenzer at gmail.com>
wrote:> This isn't a strictly llvm-related problem, but I thought I'd ask
anyway to
> see if anyone can help.
> I'm trying to write a dynamically typed language on top of llvm. My
initial
> idea was to have a general object type for all objects in my language. I
> came up with:
> { i8, i8* }
> the first element of the structure would hold the type of the object, and
> the second is a pointer to the actual data.
> Now, I'm not exactly sure how to get my data allocated somewhere in
order to
> be able to get a pointer to it. My initial thought was heap allocations,
> though there doesn't seem to be any llvm instructions that perform heap
> allocations, though I imagine you just use C's malloc and free? If you
do
> use those, however, is there a way of getting the byte-size of a type, to
> know what to pass to malloc? There's also the issue of having to know
when
> to be able to free() the pointers.
> The other option, I guess, would be stack allocations with alloca
> instructions? I don't need to worry about the sizes of types or about
> calling free, but now my objects can't live on past the scope of a
function,
> which may complicate things. For instance, if at my jiting repl (set up
like
> the Kaleidoscope tutorial, where top-level expressions are wrapped in
> lambdas and then executed), I type in "5", the repl should spit 5
back to
> me. If I use allocas here there isn't a problem. But if I define a
global
> variable and assign 5 to it, the data I alloca'd is going to be gone
after
> the anonymous function returns. This makes it seem like heap allocations
> would be a better choice.
> So basically, I'm sort of stuck not knowing the best way to implement
this
> (or which way will even be possible). I'd appreciate any input/guidance
on
> how to proceed.
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>

John McCall

2010-Aug-16 22:02 UTC

head link

[LLVMdev] dynamic typing system

On Aug 16, 2010, at 1:26 PM, Alec Benzer wrote:> This isn't a strictly llvm-related problem, but I thought I'd ask
anyway to see if anyone can help.
>
> I'm trying to write a dynamically typed language on top of llvm. My
initial idea was to have a general object type for all objects in my language. I
came up with:
This is a huge subject, but briefly:
1. Unless your type system is really weird, you will need a garbage-collected
heap. The design and quality of your GC will determine a lot of the performance
of your language. However, it is relatively easy to improve your garbage
collector and allocator without fundamentally changing your object
representation.
2. You should design your object representation to make it as cheap as possible
to create and recognize the values you expect to be working with the most. If
your dynamic programs are going to do a lot of text processing, don't make
it expensive to check whether a value is a string. In particular, booleans and
small integers should be easy to recognize.
3. If you have any control over your language's semantics, I strongly
suggest avoiding implicit coercions; it should be a (dynamic) type error to use
a string where an integer was expected. This can substantially cut down on the
amount of code you have to generate.
4. So-called "fat pointers" (pointers that include extra bytes of
information, like {i8, i8*}) can significantly increase the amount of memory you
use, since most architectures require pointers to be aligned on 4- or 8-byte
boundaries (or perform better when they are). Try to see if you can get away
with tagged pointers, i.e. using the low bits of a value to determine how to
interpret the rest of it. Getting good low-level performance out of a
dynamically-typed language requires a lot of type-unsafe tricks like this.

John.

Reasonably Related Threads

Search for more apparently analagous threads

llvm dev - Aug 2010 - [LLVMdev] dynamic typing system

[LLVMdev] dynamic typing system

[LLVMdev] dynamic typing system

[LLVMdev] dynamic typing system

[LLVMdev] dynamic typing system

[LLVMdev] dynamic typing system

Reasonably Related Threads