On 07/29/2012 03:30 PM, Tobias Grosser wrote:> On 07/26/2012 04:49 PM, Justin Holewinski wrote:
>> I'm not convinced that having multi-module IR files is the way to
go.
>> It just seems like a lot of infrastructure/design work for little
>> gain. Can the embedded modules have embedded modules themselves? How
>> deep can this go? If not, then the embedded LLVM IR language is really
>> a subset of the full LLVM IR language. How do you share variables
>> between parent and embedded modules?
>
> I don't have final answers to these questions, but here my current
> thoughts: I do not see a need for deeply nested modules, but I also
> don't see a big problem. Variables between parent and embedded modules
> are not shared. They are within separate address spaces.
But some targets may allow sharing variables, how would this be implemented?
>
>> I feel that this can be better solved by just using separate IR
modules.
>> For your purposes, the pass that generates the device code can simply
>> create a new module and the host code can refer to the generated code
by
>> name. Then, you can run each module through opt and llc individually,
>> and then link them together somehow, like Dmitry's use of ELF
>> symbols/sections. This is exactly how CUDA binaries work; device code
>> is embedded into the host binary as special ELF sections. This would
be
>> a bit more work on the part of your toolchain to make sure opt and llc
>> and executed for each produced module, but the changes are far fewer
>> than supporting sub-modules in a single IR file. This also has the
>> benefit that you do not need to change LLVM at all for this to work.
>>
>> Is there some particular use-case that just won't work without
>> sub-module support? I know you like using the example of "clang
-o - |
>> opt -o - | llc" but I'm just not convinced that retaining the
ability to
>> pipe tools like that is justification enough to change such a
>> fundamental part of the LLVM system.
>
> As I mentioned to Duncan, I agree with you that for a specific tool
> chain, the approach you mentioned is probably best. However, I am
> aiming for a more generic approach, which aims for optimizer plugins
> that can be used in various LLVM-based compilers, without the need for
> larger changes to each these compilers. Do you think that is a useful
> goal?
I think that the same can be achieved using already-existing
functionality, like archives. Granted, right now the command-line tools
cannot directly process archives containing bit-code files, but I
believe it would be more beneficial to support that than implementing
nested bit-code files. In any optimizer, you would have to set up a
different pass chain for different architectures anyway.
I feel that it would be reasonable to allow clang/opt to produce an
archive with multiple bit-code files instead of a single module as they
do today.
Part of the issue I see with nested modules is how to invoke the
optimizer. To get the most performance out of the code, you'll probably
have to pass different options to opt for the host and device code. So
wouldn't you need to invoke opt multiple times anyway?
>
> Thanks for your feedback
> Tobi
--
Thanks,
Justin Holewinski