thr3ads.net - llvm dev - [LLVMdev] LLVM Archive Format Extension Proposal [Nov 2012]

If this information is useful, please help other people find it:
Share via:

Relph, Richard

2012-Nov-22 00:00 UTC

[LLVMdev] LLVM Archive Format Extension Proposal

On Nov 21, 2012, at 11:07 AM, Nick Kledzik <kledzik at
apple.com<mailto:kledzik at apple.com>> wrote:


On Nov 21, 2012, at 8:55 AM, Relph, Richard wrote:
AMD would like to add new functionality to ranlib (and later ar and nm) and to
the bits of LLVM Core that read (and later write) archives.
Herewith a terse summary of the change, which we want to improve support of
OpenCL for multiple GPUs in a single run-time.

Conceptually, a serialized archive is really 2 pieces: a few header members and
a set of normal file members. There are no constraints on the normal members in
the 'pure' archive format. They could be text files, pictures, or, as
we're all familiar with, object modules. Most object file archives are
"libraries" and the have a special header member that is a global
symbol table, associating global scope names with defining object module members
in the archive body.

We have N very large archives, defining essentially the same set of symbols.
Many of the normal file members of each are duplicated in other archives, but
not all. The goal is the produce a single "super-archive" that
contains 1 copy of each unique object file member no matter how many archives it
is part of, and N symbol table members representing each of the original N
archives.
Let me see if I understand your need here.  You are dynamically generating code
and need to link it.  The linking step requires some support routines which
makes sense to have in a static library.  Since this must work on machines not
set up with developer tools, you are packaging the static library inside a
DLL/DSO.  In addition, with all the minor variations of GPUs, having a separate
archive for every GPU type would be too large, so you need some way to remove
duplicates of support functions.

Close. But it isn't "support routines". It's the entire OpenCL
"built-in" library… thousands and thousands of functions, some tiny,
some large, some merely aliases.

If the above summary is close, then here are two other ideas that avoid the need
for archive/TOC changes:
1) Have lots of little archives which removes duplicates.  Give each archive a
unique name, then have a lookup table which lists which sequence of archives to
use for which specific GPU.  When linking for a particular GPU, you pass the
linker that particular sequence of little archives to search.

This is close to what we have now, but find unworkable because the Linker
doesn't take a "set" of libraries. So if library A calls something
in B calls something in A, you end up having to make multiple passes over the
libraries, which isn't efficient and creates other headaches. This is what
we are trying to 'solve' by going to a single library, but we want to
preserve the space advantage of the current approach.

2) Have one giant archive for per GPU family and use name mangling scheme to
filter.  For instance, the compiler emits references to support routine
"foo" as "foo$gpu1".  Then you construct the support
libraries with aliases for each support function.  So a particular
implementation of foo in an archive may show up in the symbol table through its
aliases "foo$gpu3", "foo$gpu4", "foo$gpu7".   When
the linker is only looking for "foo$gpu1", it will ignore all other
foo implementations and just pick the one aliased to "foo$gpu1".

I'll have to think about this… We try to treat the IR emitted by the
front-end as pretty family/device neutral to the extent that's possible.
We'd want to implement this as a pass between the linking of user-produced
code and the subsequent linking of the OpenCL library. And we'd obviously
have to create a tool that takes the current library members and aliases them to
the 'correct' unique implementation.

BTW,  Apple/Darwin has a similar issue with supporting multiple CPUs.  Our
solution is "fat" archives.  The ranlib tool sorts all archive members
by cpu, builds "thin" archive libraries for each cpu, then
concatenates the thin libraries together with a "fat header" which
specifies the file offset and size of each thin archive.  The linker, when it
comes across a fat static library, looks at the fat header thens seeks along to
the contained thin archive relevant to the cpu type being linked.  This scheme
works well, but it can produce large files because there is no attempt to reduce
duplicates.

Sounds pretty similar in terms of the problem… except we are motivated to avoid
duplicates because a lot of the members are truly identical, whereas the
"fat" archive probably has few identical members at the binary level
because of instruction set and calling convention differences. This is most
similar to my first considered approach.

Thanks,
Richard


-Nick



The symbol table for each original archive can properly index to the relevant
members in the archive, even if other members in the super-archive (not
referenced in this particular symbol table, of course) define the same symbols.

I've considered 3 approaches to the problem so far. All involve a new
archive member type.

First, a new archive member type "up front" that describes each of the
original archives and its symbol table.
Second, a normal/default symbol table member "up front" and a new
archive member type that describes alternate symbol tables contained in the
archive.
Third, a "hiding" archive member type that is essentially a way to
"skip over" additional normal archive file headers to reach the first
normal member, which (in all approaches) all archives share.

The third, I think, requires the least changes to the existing implementation,
so I'm leaning towards it. The "hiding" archive member would have
the "file name" of the represented archive immediately following the
member header, followed by a completely normal archive representation starting
with "<!arch>\n" and optionally including an additional
"hiding" archive member covering even more hidden archives.

The plan is to extend the Archive class to provide for a way for clients to
select a desired archive. I also will enhance ranlib to accept multiple archive
names on the command line and produce the "super-archive" from ranlib.

A further need we have is to serialize the TOCs and the super-archive in a
memory image (our archives are embedded in our DLL/SO, not stored separately on
disk) and then provide an interface to the relevant LLVM classes (Linker,
primarily) for accessing archives in memory rather than on disk, a feature
absent from the current implementation.

For our purposes, extending the Archive class to support specification of the
archive using a memory object instead of a file, recognizing the
"hiding" member type, and extending ranlib to produce the new super
archives is all we really need.

Any thoughts or suggestions would be welcome.

Thanks,
Richard

_______________________________________________
LLVM Developers mailing list
LLVMdev at cs.uiuc.edu<mailto:LLVMdev at cs.uiuc.edu>        
http://llvm.cs.uiuc.edu<http://llvm.cs.uiuc.edu/>
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev


-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20121122/0e802f18/attachment.html>

Nick Kledzik

2012-Nov-24 22:48 UTC

head link

[LLVMdev] LLVM Archive Format Extension Proposal

On Nov 21, 2012, at 4:00 PM, Relph, Richard wrote:> On Nov 21, 2012, at 11:07 AM, Nick Kledzik <kledzik at apple.com>
wrote:
>> On Nov 21, 2012, at 8:55 AM, Relph, Richard wrote:
>>> AMD would like to add new functionality to ranlib (and later ar and
nm) and to the bits of LLVM Core that read (and later write) archives.
>>> Herewith a terse summary of the change, which we want to improve
support of OpenCL for multiple GPUs in a single run-time.
>>> 
>>> Conceptually, a serialized archive is really 2 pieces: a few header
members and a set of normal file members. There are no constraints on the normal
members in the 'pure' archive format. They could be text files,
pictures, or, as we're all familiar with, object modules. Most object file
archives are "libraries" and the have a special header member that is
a global symbol table, associating global scope names with defining object
module members in the archive body.
>>> 
>>> We have N very large archives, defining essentially the same set of
symbols. Many of the normal file members of each are duplicated in other
archives, but not all. The goal is the produce a single
"super-archive" that contains 1 copy of each unique object file member
no matter how many archives it is part of, and N symbol table members
representing each of the original N archives.
>> Let me see if I understand your need here.  You are dynamically
generating code and need to link it.  The linking step requires some support
routines which makes sense to have in a static library.  Since this must work on
machines not set up with developer tools, you are packaging the static library
inside a DLL/DSO.  In addition, with all the minor variations of GPUs, having a
separate archive for every GPU type would be too large, so you need some way to
remove duplicates of support functions.
> 
> Close. But it isn't "support routines". It's the entire
OpenCL "built-in" library… thousands and thousands of functions, some
tiny, some large, some merely aliases.
> 
>> If the above summary is close, then here are two other ideas that avoid
the need for archive/TOC changes:
>> 1) Have lots of little archives which removes duplicates.  Give each
archive a unique name, then have a lookup table which lists which sequence of
archives to use for which specific GPU.  When linking for a particular GPU, you
pass the linker that particular sequence of little archives to search.
> 
> This is close to what we have now, but find unworkable because the Linker
doesn't take a "set" of libraries. So if library A calls something
in B calls something in A, you end up having to make multiple passes over the
libraries, which isn't efficient and creates other headaches. This is what
we are trying to 'solve' by going to a single library, but we want to
preserve the space advantage of the current approach.
The gnu linker has the option:
  --start-group archives --end-group
which tells the linker to keep searching the list of archives (not just make one
pass).   The darwin linker does this by default.

BTW, I had assumed you were using the "system linker".   But that
won't work if you are adding some new kind of archive table of contents.  I
guess that means you have some in-process code that provides linking
functionality?  If so, I'd like to understand this better to see if this is
something lld <http://lld.llvm.org/> might want to support some day.


-Nick

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20121124/b4b122f0/attachment.html>

Relph, Richard

2012-Dec-06 17:32 UTC

head link

[LLVMdev] LLVM Archive Format Extension Proposal

On Nov 24, 2012, at 2:48 PM, Nick Kledzik <kledzik at
apple.com<mailto:kledzik at apple.com>> wrote:

On Nov 21, 2012, at 4:00 PM, Relph, Richard wrote:
On Nov 21, 2012, at 11:07 AM, Nick Kledzik <kledzik at
apple.com<mailto:kledzik at apple.com>> wrote:
On Nov 21, 2012, at 8:55 AM, Relph, Richard wrote:
AMD would like to add new functionality to ranlib (and later ar and nm) and to
the bits of LLVM Core that read (and later write) archives.
Herewith a terse summary of the change, which we want to improve support of
OpenCL for multiple GPUs in a single run-time.

Conceptually, a serialized archive is really 2 pieces: a few header members and
a set of normal file members. There are no constraints on the normal members in
the 'pure' archive format. They could be text files, pictures, or, as
we're all familiar with, object modules. Most object file archives are
"libraries" and the have a special header member that is a global
symbol table, associating global scope names with defining object module members
in the archive body.

We have N very large archives, defining essentially the same set of symbols.
Many of the normal file members of each are duplicated in other archives, but
not all. The goal is the produce a single "super-archive" that
contains 1 copy of each unique object file member no matter how many archives it
is part of, and N symbol table members representing each of the original N
archives.
Let me see if I understand your need here.  You are dynamically generating code
and need to link it.  The linking step requires some support routines which
makes sense to have in a static library.  Since this must work on machines not
set up with developer tools, you are packaging the static library inside a
DLL/DSO.  In addition, with all the minor variations of GPUs, having a separate
archive for every GPU type would be too large, so you need some way to remove
duplicates of support functions.

Close. But it isn't "support routines". It's the entire OpenCL
"built-in" library… thousands and thousands of functions, some tiny,
some large, some merely aliases.

If the above summary is close, then here are two other ideas that avoid the need
for archive/TOC changes:
1) Have lots of little archives which removes duplicates.  Give each archive a
unique name, then have a lookup table which lists which sequence of archives to
use for which specific GPU.  When linking for a particular GPU, you pass the
linker that particular sequence of little archives to search.

This is close to what we have now, but find unworkable because the Linker
doesn't take a "set" of libraries. So if library A calls something
in B calls something in A, you end up having to make multiple passes over the
libraries, which isn't efficient and creates other headaches. This is what
we are trying to 'solve' by going to a single library, but we want to
preserve the space advantage of the current approach.

The gnu linker has the option:
  --start-group archives --end-group
which tells the linker to keep searching the list of archives (not just make one
pass).   The darwin linker does this by default.

BTW, I had assumed you were using the "system linker".   But that
won't work if you are adding some new kind of archive table of contents.  I
guess that means you have some in-process code that provides linking
functionality?  If so, I'd like to understand this better to see if this is
something lld <http://lld.llvm.org/> might want to support some day.

We are using the Linker class, which uses the Bitcode/Archive class, which is
why I was focusing on that class to support the new functionality.
Once I have Bitcode/Archive enhanced, and llvm-ar generates the new
"multi-archives", I was going to add a method to the Linker class to
support linking "in-memory archives"… not strictly part of the Archive
enhancements, but something we else we need.

Richard
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20121206/26fce778/attachment.html>

Reasonably Related Threads

Search for more maybe matching threads

llvm dev - Nov 2012 - [LLVMdev] LLVM Archive Format Extension Proposal

[LLVMdev] LLVM Archive Format Extension Proposal

[LLVMdev] LLVM Archive Format Extension Proposal

[LLVMdev] LLVM Archive Format Extension Proposal

Reasonably Related Threads