thr3ads.net - llvm dev - [LLVMdev] LLVM Archive Format Extension Proposal [Dec 2012]

If this information is useful, please help other people find it:
Share via:

Relph, Richard

2012-Nov-22 00:28 UTC

[LLVMdev] LLVM Archive Format Extension Proposal

On Nov 21, 2012, at 12:09 PM, Michael Spencer <bigcheesegs at gmail.com>
wrote:
> On Wed, Nov 21, 2012 at 8:55 AM, Relph, Richard <Richard.Relph at
amd.com> wrote:
>> AMD would like to add new functionality to ranlib (and later ar and nm)
and
>> to the bits of LLVM Core that read (and later write) archives.
>> Herewith a terse summary of the change, which we want to improve
support of
>> OpenCL for multiple GPUs in a single run-time.
>> 
>> Conceptually, a serialized archive is really 2 pieces: a few header
members
>> and a set of normal file members. There are no constraints on the
normal
>> members in the 'pure' archive format. They could be text files,
pictures,
>> or, as we're all familiar with, object modules. Most object file
archives
>> are "libraries" and the have a special header member that is
a global symbol
>> table, associating global scope names with defining object module
members in
>> the archive body.
>> 
>> We have N very large archives, defining essentially the same set of
symbols.
>> Many of the normal file members of each are duplicated in other
archives,
>> but not all. The goal is the produce a single "super-archive"
that contains
>> 1 copy of each unique object file member no matter how many archives it
is
>> part of, and N symbol table members representing each of the original N
>> archives.
>> 
>> The symbol table for each original archive can properly index to the
>> relevant members in the archive, even if other members in the
super-archive
>> (not referenced in this particular symbol table, of course) define the
same
>> symbols.
>> 
>> I've considered 3 approaches to the problem so far. All involve a
new
>> archive member type.
>> 
>> First, a new archive member type "up front" that describes
each of the
>> original archives and its symbol table.
>> Second, a normal/default symbol table member "up front" and a
new archive
>> member type that describes alternate symbol tables contained in the
archive.
>> Third, a "hiding" archive member type that is essentially a
way to "skip
>> over" additional normal archive file headers to reach the first
normal
>> member, which (in all approaches) all archives share.
>> 
>> The third, I think, requires the least changes to the existing
>> implementation, so I'm leaning towards it. The "hiding"
archive member would
>> have the "file name" of the represented archive immediately
following the
>> member header, followed by a completely normal archive representation
>> starting with "<!arch>\n" and optionally including an
additional "hiding"
>> archive member covering even more hidden archives.
>> 
>> The plan is to extend the Archive class to provide for a way for
clients to
>> select a desired archive. I also will enhance ranlib to accept multiple
>> archive names on the command line and produce the
"super-archive" from
>> ranlib.
>> 
>> A further need we have is to serialize the TOCs and the super-archive
in a
>> memory image (our archives are embedded in our DLL/SO, not stored
separately
>> on disk) and then provide an interface to the relevant LLVM classes
(Linker,
>> primarily) for accessing archives in memory rather than on disk, a
feature
>> absent from the current implementation.
>> 
>> For our purposes, extending the Archive class to support specification
of
>> the archive using a memory object instead of a file, recognizing the
>> "hiding" member type, and extending ranlib to produce the new
super archives
>> is all we really need.
>> 
>> Any thoughts or suggestions would be welcome.
>> 
>> Thanks,
>> Richard
> 
> Note that I plan to remove llvm/Bitcode/Archive once Object/Archive is
> capable of replacing it. The llvm tools that don't write archives
> files have already been switched over to it. Object/Archive already
> supports MemoryBuffer as a source for the data.
I had meant to ask in my email about the apparent duplication of Archive in
Bitcode and Object libs… Good to know. Since ranlib currently uses Bitcode,
that's what I've been focusing on, but I had noticed the
Object/Archive.h.
> I support Nick's solution over extending the archive format
> internally. I would also support adding a true wrapper. What I don't
> like is not knowing that it's a special archive by just looking at the
> file header.
I will think about Nick's second idea some more.
When you say a "true wrapper", are you suggesting a specific
"first member" type that encapsulates the multiple symbol tables and
serves as a flag? I'm not sure what you're getting at here, since the
only "file header" an overall archive file has is the 8-character
<!arch>\n signature. Everything else (string table, various kinds of
symbol tables) are merely optional "special members", noted by their
"special" filename. My "hiding" special member is of the
same kind... I had tentatively chosen #_MULTI_SYMTAB_# as the special name of
this "hiding" special member. The changes to the Bitcode Archive
implementation to handling reading it is pretty simple and complete consistent
with the existing "special member" handling code.

Writing these is much more complicated, though, which is why I was focusing on
ranlib for prototyping. Once I had something working, I'd know better how or
even whether to fold it in to the Archive class.

Thanks,
Richard

Relph, Richard

2012-Dec-03 22:08 UTC

head link

[LLVMdev] LLVM Archive Format Extension Proposal

On Nov 21, 2012, at 4:28 PM, "Relph, Richard" <Richard.Relph at
amd.com> wrote:
> On Nov 21, 2012, at 12:09 PM, Michael Spencer <bigcheesegs at
gmail.com> wrote:
> 
>> Note that I plan to remove llvm/Bitcode/Archive once Object/Archive is
>> capable of replacing it. The llvm tools that don't write archives
>> files have already been switched over to it. Object/Archive already
>> supports MemoryBuffer as a source for the data.
> 
> I had meant to ask in my email about the apparent duplication of Archive in
Bitcode and Object libs… Good to know. Since ranlib currently uses Bitcode,
that's what I've been focusing on, but I had noticed the
Object/Archive.h.
Michael,
    I understand and agree that having 2 Archive implementations is something
that should be fixed. Do you have a rough idea about when you might do the
unification?
    Also, why unify around the Object/Archive implementation instead of the
Bitcode/Archive implementation? What can the Object/Archive implementation
"do" that can't be done with the Bitcode implementation?
    I ask because after looking at Archive in Object and Archive in Bitcode, the
Archive in Bitcode seems much better documented than the Archive in Object, and
feels (at least to me at first glance) like a somewhat better model of what
Archives are. And as you've already noted, Object/Archive can't do
writes...

Richard

Michael Spencer

2012-Dec-06 05:10 UTC

head link

[LLVMdev] LLVM Archive Format Extension Proposal

On Mon, Dec 3, 2012 at 2:08 PM, Relph, Richard <Richard.Relph at amd.com>
wrote:> On Nov 21, 2012, at 4:28 PM, "Relph, Richard" <Richard.Relph
at amd.com> wrote:
>
>> On Nov 21, 2012, at 12:09 PM, Michael Spencer <bigcheesegs at
gmail.com> wrote:
>>
>>> Note that I plan to remove llvm/Bitcode/Archive once Object/Archive
is
>>> capable of replacing it. The llvm tools that don't write
archives
>>> files have already been switched over to it. Object/Archive already
>>> supports MemoryBuffer as a source for the data.
>>
>> I had meant to ask in my email about the apparent duplication of
Archive in Bitcode and Object libs… Good to know. Since ranlib currently uses
Bitcode, that's what I've been focusing on, but I had noticed the
Object/Archive.h.
>
> Michael,
>     I understand and agree that having 2 Archive implementations is
something that should be fixed. Do you have a rough idea about when you might do
the unification?
>     Also, why unify around the Object/Archive implementation instead of the
Bitcode/Archive implementation? What can the Object/Archive implementation
"do" that can't be done with the Bitcode implementation?
>     I ask because after looking at Archive in Object and Archive in
Bitcode, the Archive in Bitcode seems much better documented than the Archive in
Object, and feels (at least to me at first glance) like a somewhat better model
of what Archives are. And as you've already noted, Object/Archive can't
do writes...
>
> Richard
I wrote Object/Archive for a couple reasons. The main reason was
performance. Bitcode/Archive parses the entire archive file up front
including the symbol table. Object/Archive does it lazily and uses
much less memory. The other reason is that Bitcode/Archive is heavily
focused on bitcode files. It even requires an LLVMContext to
construct. This is was not optimal for my object file needs.

- Michael Spencer

Possibly Parallel Threads

Search for more reasonably related threads

llvm dev - Dec 2012 - [LLVMdev] LLVM Archive Format Extension Proposal

[LLVMdev] LLVM Archive Format Extension Proposal

[LLVMdev] LLVM Archive Format Extension Proposal

[LLVMdev] LLVM Archive Format Extension Proposal

Possibly Parallel Threads