thr3ads.net - llvm dev - [LLVMdev] me being stupid: me vs the llvm codebase... [Oct 2007]

If this information is useful, please help other people find it:
Share via:

BGB

2007-Oct-23 22:19 UTC

[LLVMdev] me being stupid: me vs the llvm codebase...

----- Original Message ----- 
From: "Gordon Henriksen" <gordonhenriksen at mac.com>
To: "LLVM Developers Mailing List" <llvmdev at cs.uiuc.edu>
Sent: Wednesday, October 24, 2007 1:45 AM
Subject: Re: [LLVMdev] me being stupid: me vs the llvm codebase...

On Oct 23, 2007, at 05:52, BGB wrote:
> I am assuming then that some external assembler is used (such as
> 'gas')?...
< In the static compilers, yes. The JIT directly serializes
instructions into memory without the aid of an external assembler.
There are also experimental built-in assemblers; LLVM calls them
object writers[1]. >

ok, ok. I just couldn't find them is all...
yeah, I once started doing direct instruction serialization (aided by an 
'assembler' that was infact a mass of functions using a procedural 
interface).

eventually, I decided that this route was just too damn inconvinient, and so 
this was largely replaced by a printf-style interface (using a variation of 
NASM-style syntax, though with a few differences, such as allowing lumping 
multiple opcodes per line, ...).

but, then, later on the internals of the codegen went partly back to such an 
interface, though mostly because of the need to abstract things (generating 
different possible sequences of instructions depending on context and so on, 
what could later become the basis of 'VAS'...).

> it looks like much of the interconnection and data sharing is done
> through objects and templates?...
< That's correct. The LLVM intermediate representation (IR) is well-
suited for many transformations and analyses, which are generally
structured as passes[2]. The LLVM IR has both object-oriented[3],
textual (.ll) [4], and binary (.bc "bitcode") [5] representations;
all are fully equivalent. However, it is more efficient not to wring
the program through multiple print/parse or write/read cycles, so the
object-oriented representation is generally maintained within any
single process. >

yes, ok.
I generally use whatever representation within the process, but many 
processes communicate textually. in part, this gives an easier way to 
inspect what is going on (I can look at a dump of the preprocessor output, 
parse trees, RIL code, or assembler and see how things are working, ...).

in part, this also allows maintaining isolation between the processes, where 
many may represent data differently. serializing to text and parsing from 
text allow a good deal of abstraction (one part is unwinding and dumping its 
output informally into a big text buffer), while another part winds back up 
potentially building a very different representation of the data (for 
example, the upper compiler is based around trees, where the lower compiler 
is more based around buffers and flat arrays, and doing funky bit-twiddling 
to cram much of the typesystem mechanics into a single 32bit integer...).

it is also a lot easier to make changes to the printer or parser, than to go 
through the horrid pain of changing around a bunch of structs and/or having 
to modify a bunch of code for seemingly trivial alterations...

this was in fact a major reason for why I created my 'RPNIL' language...

I had assumed different languages would target it, so I wanted something 
hopefully fairly general. different frontends could be written hopefully 
without too much interdependence or conflict.

this is also why I had chosen a stack machine, as this offers at least some 
semblance of machine-abstraction. I am also fairly fammiliar with stack 
machines, as I have had a long time of fairly good success with them...

a major problem I have ran into though is related to evaluation ordering, 
which though providing multiple options, doesn't really provide any great 
options.

in part, the x86-64 calling convention leaves a major point of pain I have 
not yet decided how to work around... (I need to figure some good way to 
either reorder the code or reorder the data within the confines of an 
abstract stack model...). one option, though disliked as it would require 
changing RPNIL, would be to make each function argument be a 
postscript-style block, with an instruction to indicate that it is an 
argument. (ick...).

also, originally I had assumed the RPNIL compiler would also be replacable, 
but I am starting to doubt this...

the original idea I had when designing RPNIL was that by the time I got to 
this, I would implement a three-address-code version of the RPNIL compiler 
(thus making the stack almost purely an abstraction). but, my first, and 
currently only implementation, is a multi-pass linear processor (first pass, 
determines register usage and stack layout and so on, second pass generates 
code...).

going to TAC may be what I do if I split the RPNIL compiler, where the upper 
half will convert RIL to 'VAS' (just comming up with a term here, like 
'virtual assembler'), which would be more or less machine specific, but
not
yet worked through all the 'gritty details', such as the exact sequences
of
instructions used in representing these various operations (the VAS-stage 
would be mostly a thin translator working through the instruction-sequence 
details, and spitting out raw assembler).

another previously considered option was compiling from RPNIL to LLVM (LLVM 
seems to come somewhere somewhat lower-level than RPNIL, but a bit 
higher-level than what my 'VAS' idea would probably be...).

< The code generators also convert the program into the SelectionDAG
and MachineFunction forms, both of which are target-independent in
form but not in content.[6] Each of these forms have multiple states
with differing invariants. (Strictly speaking, however, these forms
are private to each code generator; the C backend does not use
either.) These code generation forms do not have first-class textual
or binary representations, since they are ephemeral data structures
used only during code generation. They can however be dumped to human-
readable text, or viewed with GraphVis. >

ok.

> doesn't appear much like working like a dynamic compiler is a major
> design goal (so I hear, it can be used this way, but this is not
> the focus).
>
> so, it looks like the design focuses mostly of taking the input
> modules, grinding it and mixing it, and doing lots of spify inter-
> module optimizations (presumably forming a monolithic output
> representing the entire project?...).
< LLVM does work well as a static (offline) compiler, where inter-
procedural optimization and link-time optimization are useful. In
llvm-gcc, link-time optimization ("mixing" as you say) only occurs
with at -O4. Typically, IPO is performed only within a single
compilation unit (-O3/-O2). No IPO is performed at -O0. >

yes, ok.

note however, that I don't even want to touch the gcc-frontend in my 
projects...

the gcc codebase is a horror I would rather not have the misfortune of 
dealing with (much less trying to make it think it was something like lisp 
or python...).

> as a result, my compiler generally refrains from inlining things or
> doing brittle inter-function optimizations (after all, one could
> potentially relink parts of the image and break things...).
< It's possible to use LLVM in the same manner by simply refraining
from the use of inter-procedural optimizations. >

possibly, yes.

< If LLVM bytecode is used as the on-disk representation, however, LLVM
would allow the use of offline optimizations before starting the JIT
program. This could include IPO or LTO at the developer's option, and
would be entirely safe if the unit of dynamism were restricted to an
LLVM module, since LTO merges modules together. >

ok, dunno here.

> how well would LLVM work for being used in a manner comprable to
> LISP-style eval (or Self, Smalltalk, or Python style incremental
> restructuring)?...
< Simply codegen the string into a function at runtime, JIT it, and
call it.[7] Afterwards, the IR and the machine code representation
can be deleted. >

ok. how well does this work if, say, we decide to override a globally 
defined function with a newly defined one?...

> and incrementally replacing functions or modules at runtime?...
< Generally speaking, LLVM neither helps nor hinders here. Maybe
someone will follow up with whether the JIT uses stub functions which
would enable dynamic relinking If not, it would be a straightforward,
if platform-specific, feature to add. >

I don't use proxy or stub functions, I relink them...

basically, a system of tables is kept keeping track of where all the various 
functions and variables are located, where all they are used from, ...

as a result, moving a function or variable causes the linker to go and 
'unlink' the references (relocating by the inverse addr so that the 
references point back to NULL), and 'relinking' (which involves
modifying
the references to point to the new location).

(in general, I don't relocate normal variables though, largely because this 
would tend to cause them to lose their value...).

now, sadly, one has to be pretty hopeful another thread is not running in 
the code when this happens, which is the risk...

likewise, all the modules are kept linked together in a kind of "heap"
of
sorts...

I did it this way so that I can still use object modules and static 
libraries compiled with gcc in much the same way as code generated by my 
compiler...

<
— Gordon

[1]
http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/
ELFWriter.cpp?view=markup
http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/
MachOWriter.cpp?view=markup

[2] http://llvm.org/docs/WritingAnLLVMPass.html

[3] http://llvm.org/docs/ProgrammersManual.html#coreclasses

[4] http://llvm.org/docs/LangRef.html

[5] http://llvm.org/docs/BitCodeFormat.html

[6] http://llvm.org/docs/CodeGenerator.html

[7] watch this space, currently under rapid construction: http://
llvm.org/docs/tutorial/
In particular, observe the HandleTopLevelExpression function in §3.3
"Implementing Code Generation to LLVM IR." That function will be
extended to handle the eval usage in §3.4 "Adding JIT and Optimizer
Support."

_______________________________________________
LLVM Developers mailing list
LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
yes, ok.

Gordon Henriksen

2007-Oct-23 23:26 UTC

head link

[LLVMdev] me being stupid: me vs the llvm codebase...

On Oct 23, 2007, at 18:19, BGB wrote:
> On Oct 23, 2007, at 11:45, Gordon Henriksen wrote:
>
>> Generally speaking, LLVM neither helps nor hinders here. Maybe  
>> someone will follow up with whether the JIT uses stub functions  
>> which would enable dynamic relinking If not, it would be a  
>> straightforward, if platform-specific, feature to add.
>
> I don't use proxy or stub functions, I relink them...
I misspoke. See here:

http://llvm.org/doxygen/classllvm_1_1JIT.html#a7

Relinking the function as you describe is risky in the case that the  
address of the function has been taken. LLVM's present approach is  
general.

— Gordon

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20071023/aa6cb17a/attachment.html>

BGB

2007-Oct-24 00:43 UTC

head link

[LLVMdev] me being stupid: me vs the llvm codebase...

oh, ok.

actually, I had partly considered this approach at one point, but opted with the
form I did instead (in large part because it does not involve such a tweak, or
dependency on the previous location).

of course, as noted, due to the possibility of function pointers, this is a
little risky. I had not considered this issue previously, but it is definitely
worth consideration...

I guess the issue then is how likely and how often a function will be
replaced...


it does lead to a possible idea though:
use of an inter-module jump table.

in this way, the actual usage of any external functions involves jumping through
such a table (either directly, or by by making use of an implicit jump).
sad issue though: this could incure a possibly noticable performance overhead
(for normal calls, but it could make moving functions really cheap).


a more selective way of doing this would be helpful (for example, a special
compiler-specific keyword, or some other approach for telling the VM/linker that
a particular function is movable).

this would probably one-off the function, by forcibly relinking it as a proxy to
the function (any future attempts at relinking the function then simply adjust
its associated pointer).

potentially, this could also be made the default behavior for when relinking
functions (if they are being forcibly relinked, well, assume they are being made
movable).

or such...


ccMakeMovable(char *func);    //an explicit call to the VM

__movable void foo(int x)
{
    //do something...
}


even more interestingly:
if the same compiler were also used for static compilation, it could be used as
a special feature to make such dynamic movability available even for statically
compiled and linked code (as is, in my case, parts of the app which are
statically compiled and linked, can't currently be relinked...).

(even more cool: it does not necessarily imply a runtime dependency, since the
stub and proxy variable would be merely passive elements, it could still be
possible to compile and link the code apart from the VM, nevermind that any such
proxy stubs would be useless though...).

absent special compiler support, this could be implemented more manually with an
existing compiler, such as gcc.

example convention:
int Afsd875gSd57g8Th(int x)    //invisible autogen name
{
    //do something
}

int (*__proxy_foo)(int x)=&Afsd875gSd57g8Th;
int foo(int x) { return(__proxy_foo); }

or (assembler):
section .data
__proxy_foo dd Afsd875gSd57g8Th

section .text
Afsd875gSd57g8Th:
push ebp
mov ebp, esp
..
pop ebp
ret

foo:
jmp [__proxy_foo]


or such...


  ----- Original Message ----- 
  From: Gordon Henriksen 
  To: LLVM Developers Mailing List 
  Sent: Wednesday, October 24, 2007 9:26 AM
  Subject: Re: [LLVMdev] me being stupid: me vs the llvm codebase...


  On Oct 23, 2007, at 18:19, BGB wrote:


    On Oct 23, 2007, at 11:45, Gordon Henriksen wrote:



      Generally speaking, LLVM neither helps nor hinders here. Maybe someone
will follow up with whether the JIT uses stub functions which would enable
dynamic relinking If not, it would be a straightforward, if platform-specific,
feature to add.




    I don't use proxy or stub functions, I relink them...



  I misspoke. See here:


  http://llvm.org/doxygen/classllvm_1_1JIT.html#a7


  Relinking the function as you describe is risky in the case that the address
of the function has been taken. LLVM's present approach is general.


  — Gordon





------------------------------------------------------------------------------


  _______________________________________________
  LLVM Developers mailing list
  LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
  http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20071024/0cd9f787/attachment.html>

Maybe Matching Threads

Search for more possibly parallel threads

llvm dev - Oct 2007 - [LLVMdev] me being stupid: me vs the llvm codebase...

[LLVMdev] me being stupid: me vs the llvm codebase...

[LLVMdev] me being stupid: me vs the llvm codebase...

[LLVMdev] me being stupid: me vs the llvm codebase...

Maybe Matching Threads