Hello,
I'd like to use LLVM as a code generator for a runtime environment
(thread scheduler + malloc + GC) I'm in the final stages of developing.
Here is a brief overview: The system maintains a constant number of
execution resources (either system threads or CPUs), which I call
"executors". Threads themselves are very lightweight, consisting only
of seven words of information, plus scheduling overhead. A scheduler
based on lock-free structures maps threads onto executors. The
scheduler communicates with threads using mailboxes, and relies on
voluntary context switches. The system provides a copying garbage
collector (similar to Cheng/Blelloch, but based on lock-free data
structures). The assumption is that threads will use the GC allocator
to allocate their frames.
My specific needs are as follows:
- Threads have a mailbox which contains the following information: the
ID of the executor running them, the current GC status, any signals
that have been sent to the thread, their GC allocation pointer, the
limit of their current GC allocation block, a pointer to their GC
write log, and the number of log entries available.
- Safepoints are necessary, both for the scheduler system to work
right, and for the GC. The thread specific data I just mentioned are
assumed to be overwritten at each safepoint. In between safepoints,
however, it can be assumed to be non-volatile.
- All pointers to GCed objects are represented as pairs of pointers.
Which pointer is in use depends on the GC status word.
- If the GC status indicates that GC is underway, all writes to GC
memory must be logged before reaching the next safe point. Multiple
writes to the same location only need to be logged once before
reaching the safe point, though.
- GC memory is allocated by advancing the GC allocation pointer. If
the allocation block runs out, then a function is called to obtain
another one.
- Heap objects need to have specifically formatted headers, and I need
to generate specifically formatted type signatures.
Obviously, there is potential for optimization in several places. If
you're not space-sensitive, you can duplicate code to deal with the
double-pointer/write barriers. You can also memoize the data from the
mailboxes. Lastly, write-logging can be delayed until a safe-point is
about to be executed, which might eliminate duplicate logs as well.
Having read over the LLVM documentation, it seems that most of this
should be fairly easy. My question is, are there any potential snags
or pitfalls, or will any of this require a substantial amount of
work? Secondly, how many of the aforementioned optimizations would
LLVM do of its own accord?
Thanks.
--
Eric McCorkle
Computer Science Ph.D Student
ericmcc at cs.umass.edu