On Sun, 2003-11-16 at 11:45, Chris Lattner wrote:> The JIT doesn't even load unreferenced functions from the disk, so this
> shouldn't be the case... (thanks to Misha for implementing this :)
> Also, the globaldce pass deletes functions which can never be called by
> the program, so large hunks of libraries get summarily removed from the
> program after static linking.
Oh! Good news! >
> > > There are multiple different ways to approach these questions
depending on
> > > what we want to do and what the priorities are. There are
several good
> > > solutions, but for now, everything needs to be statically linked.
I
> > > expect this to change over the next month or so.
>
> > When you have time, I'd like to hear what you're planning in
this area
> > as it will directly effect how I build my compiler and VM.
>
> What do you need, and what would you like? At this point there are
> several solutions that make sense, but they have to be balanced against
> practical issues.
You'll find that I'm a pragmatic idealist. :)
So, ultimately, what I would >like< is a fully indexed bzip2 compressed
archive of byte code files that could be loaded or partially loaded into
a running program. Individual components of the package (i.e. the
modules it is composed of) would retain their identity through the
optimization and linking process. Such an archive would serve as the
"package" or "shared object" we referenced earlier in this
discussion.
The JIT would still avoid loading unreferenced modules, functions, or
variables from the archive.
The construction of such an archive would first optimize the bytecode
together as a package (like running opt on the whole thing first) but
would ensure that external linkage functions and global variables were
not eliminated. Because the package is to be loaded as a plug-in,
certain functions and global variables must not be eliminated during
optimization. To support this kind of partial optimization, it might be
useful to add an "interface" keyword to LLVM assembly that indicates a
function or global variable that is not to be eliminated under any
circumstances.
Does this capability exist now under a different name?
What I >need< is something that fits the requirements for my VM (see
below).
> For example, say we do IPO across the package, and then
> one of the members get updated. How do we know to invalidate the results?
Use the time stamp on the package's file?
If it changes, we'd essentially have to do the optimizations again?
>
> As I think that I have mentioned before, one long-term way of implementing
> this is to attach analysis results to bytecode files as well as the code.
> Thus, you could compile libc, say, with LLVM to a "shared object"
bytecode
> file. While doing this, the optimizer could notice that "strlen"
has no
> side-effects, for example, and attach that information to the bytecode
> file.
Yes, this is an excellent idea as it will avoid re-computation of
analysis results and thereby speed up optimized linking. See my
(upcoming) comments on this topic in response to Vipin's posting on the
same.>
> When linking a program that uses libc, the linker wouldn't pull in any
> function bodies from "shared objects", but would read the
analysis results
> and attach them to the function prototypes in the program. This would
> allow the LICM optimizer, to hoist strlen calls out of loops when it makes
> sense, for example.
Sounds good.>
> Of course there are situations when it is better to actually link the
> function bodies into the program too. In the strlen example, it might be
> the case that the program will go faster if strlen is inlined into a
> particular call site.
Yes, in fact, one might want to take a program running JIT and tell it
to just completely optimize it and write a native executable for the
next time it runs.>
> I'm inclined to start simple and work our way up to these cases, but if
> you have certain usage patterns in mind, I would love to hear them, and we
> can hash out what will really get implemented...
>
> -Chris
Okay, at the risk of being verbose, here goes. I'm going to relate the
requested "usage patterns" to the virtual machine I'm constructing
named
XVM.
In XVM, there are three essential parts to a program: the user's code,
the XVM code, and the XVM plug-ins. The user's code is compiled with
full knowledge of the services available from the XVM code, but not with
any knowledge of the plug-ins. The XVM completely hides (on purpose)
the details of the plug-ins. They are "implementation details".
Similarly, the plug-ins implement a standard API and are completely
unaware of the end-user's program (more on this below).
While I had originally intended to compile XVM itself to fast native
code using GCC, I'm beginning to believe that it might be more effective
to compile it to bytecode so that the XVM itself could be optimized for
the way the user's code uses it. Or, perhaps do it both ways and leave
the determination of how a program gets optimized (or not!) until it is
executed.
User's programs executed by the XVM do not have a "main". The XVM
itself
implements "main", handles configuration, manages memory and access to
system resources very strictly, etc. What >is< executed from the
user's
code are tasks. A task is simply a non-blocking function that has an
input queue and zero or more output queues. The queues are managed by
the XVM, not the task. The tasks run asynchronously to each other. Each
task registers for certain kinds of events and produces other events. To
get it started, there is an "initial event" that some task must
register
for. Think of an event as the set of arguments to a function. When the
XVM executes a user's program, it simply generates the initial event and
provides processing time to these tasks. If the program generates a
terminate event, the XVM shuts down after allowing the program to
completely process the terminate event. As the program executes, there
are XVM operations that the user's program can invoke to load or unload
other tasks. In this way the nature of the "program" can change over
time. The reason for this is that a program might have multiple
operating modes (e.g. online vs. batch) during which completely
different sets of tasks execute independently or in conjunction with one
another (i.e. the online tasks might behave differently while batch is
executing but that is simply a fact of loading the batch tasks, not
something inherent or coded into the online tasks). There are many
program design choices that this facility enables. If you're familiar
with "active objects", a task is somewhat akin to that.
I imagine this scenario will be rather difficult for LLVM to optimize at
runtime. This is one of the reasons that I was asking about creating
optimized packages. Presumably the modules necessary to implement some
set of related tasks (e.g. the batch tasks) would all be combined and
optimized into a single package. Not everything in the package may get
executed because it depends on the events being generated by the other
tasks already executing. This is important because the XVM doesn't
necessarily know the full graph of tasks it will execute when it starts.
In general, it won't even know the full set before it finishes! Tasks
(and consequently the packages they live in) can be loaded very
dynamically by the user's program, even deciding to use variants of
similar task packages at different times or basing the decision on some
configuration file that gets read at runtime or any other criteria.
Indeed, it will, in general, not be possible for LLVM to know which
packages are going to eventually get loaded. How would LLVM address
optimizing such a program? My solution so far has been to just optimize
the hell out of the individual packages and leave it at that. But, given
that LLVM's thing is "continuous whole program optimization", can
it
address an environment where it doesn't even know what the "whole
program" is?
Back to plug-ins. XVM supports plug-ins for two reasons: extensibility,
and alternative implementations of standard APIs. The XVM knows about
security, transactions, database access, memory management, naming
services, etc. Each of these capabilities has a standardized API or set
of APIs. Multiple implementations of an API can co-exist in the XVM. For
example, there might be a database access plug-in for each of Oracle,
MySql, SqlServer, etc. Similarly for security providers, naming
services, etc. So, what I need is the ability to load multiple
packages, all implementing the same API (presumably through a function
pointer table). I would like these packages to be distributed as either
real shared objects (native code) or compressed archives of bytecode. In
either case the package is a single file.
For extensibility, XVM allows new APIs to be added to the machine such
that programs that know about the API can invoke them. That is, it is
possible to create in XPL both an extension (that plugs in to the back
end of the XVM) and a program that uses the extension right up to the
soruce code level. To make this concrete, suppose we had the need to
extend the base XPL programming language to provide a complete package
for dealing with complex and rational numbers rather than just ints and
floats as is supported in the base language. This would be achieved
with a plug-in to both the XPL compiler (allowing interpretation of the
new fundamental types and their operators) and the XVM backend. The
plug-in would implement the operators on the complex and rational
numbers. The compiler would invoke those operators through the plug-in's
interface.
If statically compiled native code is provided for these plug-ins, the
loader just loads the shared object and probably doesn't attempt any
optimizations. If the analysis results you spoke of could be passed
through to native shared objects then I suppose some optimizations could
be done by the loader but I'm not particularly concerned about that
because presumably the optimizations in the loaded package are
sufficient (i.e. each interface entry point represents a significantly
large amount of processing that optimizing its binding into the larger
program is negligible).
If bytecode is provided, the loader has the opportunity to do "whole
program" optimization and should (optionally) do that.
In both the above cases (native code and bytecode) it is likely that the
entire set of API calls will be invoked by the XVM because, presumably,
the XVM has calls to each of them somewhere (even though the user's
program may not invoke the XVM functionality that would result in some
of the plug-in calls being executed). This means that there is little
to be gained in eliminating external functions in the packages because
at some point they >might< get called. To fully optimize this, what is
needed is a "whole program" optimization that incorporates the end
user's program, with the XVM code, and the plug-in code. And, it needs
to work regardless of whether we're speaking of static executables (e.g.
statically linked XVM), dynamically generated native code (e.g. JIT), or
bytecode. And, its complicated by the fact that we don't know ahead of
time what plug-ins or packages will get loaded.
The XVM also needs to be able to run in one of four modes, that I've
described before:
1. Script: goal is to begin executing quickly, do little
optimization, and get the program execution over with quickly.
This is for running "quickie" programs where you don't
want to
wait a long time for compilation, optimization, linking or
loading because the task at hand is very short.
2. Filter: goal is to be moderately efficient but still begin
execution quickly. I would expect filters to be optimized in
bytecode and then executed directly from that byte code.
3. Client: optimize and compile everything as it is loaded and
used. The nature of "Client" programs is that they are
unpredictable in what they will use because they are directed by
a human.
4. Server: compile, optimize and link the entire program statically
using as many optimizations as possible. The program is intended
to be a long running service.
I think all these cases are covered by existing LLVM functionality but
thought I'd mention it for completeness.
Sorry for being so long winded .. not sure how else to describe the
usage scenario other than just putting it all out there.
Reid.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20031116/a939480f/attachment.sig>