thr3ads.net - llvm dev - [LLVMdev] [PATCH][RFC] Add llvm.codegen Intrinsic To Support Embedded LLVM IR Code Generation [Apr 2012]

If this information is useful, please help other people find it:
Share via:

Yabin Hu

2012-Apr-28 08:25 UTC

[LLVMdev] [PATCH][RFC] Add llvm.codegen Intrinsic To Support Embedded LLVM IR Code Generation

Hi Justin,

Thanks very much for your comments.

2012/4/28 Justin Holewinski <justin.holewinski at gmail.com>
> On Fri, Apr 27, 2012 at 7:40 PM, Yabin Hu <yabin.hwu at gmail.com>
wrote:
>
>> The attached patch adds a new Intrinsic named "llvm.codegen"
to support
>> embedded LLVM IR code generation.  **The 'llvm.codegen'
intrinsic uses
>> the LLVM back ends to generate code for embedded LLVM IR strings. The
code
>> generation target can be same or different to the one of the parent
module.
>>
>>
>> The original motivation inspiring us to add this intrinsic, is to
>> generate code for heterogeneous platform. A test case in the patch
demos
>> this.  In the test case, on a X86 host, we use this intrinsic to
transform
>> an embedded  LLVM IR into a string of PTX assembly. We can then employ
a
>> PTX  execution engine ( on CUDA Supported GPU) to execute the newly
>> generated assembly and copy back the result later.
>>
>
> I have to admit, I'm not sold on this solution.  First, there is no
clear
> way to pass codegen flags to the back-end.  In PTX parlance, how would I
> embed an .ll file and compile to compute_13?
>We can handle this by provide a new argument (e.g. a string of
properly-configured Target Machine) instead of or in addition to the Arch
type string argument.

> Second, this adds a layer of obfuscation to the system.  If I look at an
> .ll file, I expect to see all of the assembly in a reasonably clean syntax.
>  If the device code is squashed into a constant array, it is much harder to
> read.
>Is the motivation for the intrinsic simply to preserve the ability to
pipe> LLVM commands together on the command-line, e.g. opt | llc?  I really feel
> that the cleaner solution is to split the IR into separate files, each of
> which can be processed independently after initial generation.
>Yes, it is. To preserve such an ability is the main benefit we got from
this intrinsic. It means we needn't to implement another compiler driver or
jit tool for our specific purpose. I agree with you that embedded llvm ir
harms the readability of the .ll file.

The usage of this intrinsic is not limited to code generation
for>> heterogeneous platform. It can also help lots of (run-time)
optimization
>> and security problems even when the code generation target is same as
the
>> one of the parent module.
>>
>
> How does this help run-time optimization?
>We implement this intrinsic by learning the implementation style of llvm's
garbage collector related intrinsics which support various GC strategies.
It can help if the ASMGenerator in the patch is revised to be able to
accept various optimization strategies provided by the user of this
intrinsic. Then the intrinsic will do what the user wants to the input code
string. When running the code with lli like jit tools, we can choose one
optimization strategy at run-time. Though haven't supported this currently,
we try to make the design as general as we can. The essential functionality
of this intrinsic is that we get an input code string, transform it into a
target-specific new one then replace the call to the intrinsic.

>  Each call to the intrinsic has two arguments. One is the LLVM IR string.
>> The other is the name of the target architecture. When running with
tools
>> like llc, lli, etc, this intrinsic transforms the input LLVM IR string 
to
>> a new string of assembly code for the target architecture firstly. Then
>> the call to the intrinsic is replaced by a pointer to the newly
generated
>> string. After this, we have in our module
>>
>
> Is the Arch parameter to llvm.codegen really needed?  Since codegen
> happens when lowering the intrinsic, the target architecture must be known.
>  But if the target architecture is known, then it should be available in
> the triple for the embedded module.
>Yes. It is better that the target data is set correctly in the embedded
module. It is the user's responsibility to do this.


Thanks again!

best regards,
Yabin
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20120428/c2412243/attachment.html>

Tobias Grosser

2012-Apr-28 10:16 UTC

head link

[LLVMdev] [PATCH][RFC] Add llvm.codegen Intrinsic To Support Embedded LLVM IR Code Generation

On 04/28/2012 10:25 AM, Yabin Hu wrote:> Hi Justin,
>
> Thanks very much for your comments.
>
> 2012/4/28 Justin Holewinski <justin.holewinski at gmail.com
> <mailto:justin.holewinski at gmail.com>>
>
>     On Fri, Apr 27, 2012 at 7:40 PM, Yabin Hu <yabin.hwu at gmail.com
>     <mailto:yabin.hwu at gmail.com>> wrote:
>
>         The attached patch adds a new Intrinsic named
"llvm.codegen" to
>         support embedded LLVM IR code generation. The
'llvm.codegen'
>         intrinsic uses the LLVM back ends to generate code for embedded
>         LLVM IR strings. The code generation target can be same or
>         different to the one of the parent module.
>
>
>         The original motivation inspiring us to add this intrinsic, is
>         to generate code for heterogeneous platform. A test case in the
>         patch demos this.  In the test case, on a X86 host, we use this
>         intrinsic to transform an embedded  LLVM IR into a string of PTX
>         assembly. We can then employ a PTX  execution engine ( on CUDA
>         Supported GPU ) to execute the newly generated assembly and copy
>         back the result later.
>
>
>     I have to admit, I'm not sold on this solution.  First, there is no
>     clear way to pass codegen flags to the back-end.  In PTX parlance,
>     how would I embed an .ll file and compile to compute_13?
>
> We can handle this by provide a new argument (e.g. a string of
> properly-configured Target Machine) instead of or in addition to the
> Arch type string argument.
I think we may in general discuss the additional information needed for 
the back ends and provide the information as parameters. We may want to 
do this on demand, in case we agreed on the general usefulness of this 
intrinsic.
>     Second, this adds a layer of obfuscation to the system.  If I look
>     at an .ll file, I expect to see all of the assembly in a reasonably
>     clean syntax.  If the device code is squashed into a constant array,
>     it is much harder to read.
I agree with Justin. The embedded code is not readable within the 
constant array. For debugging purposes having the embedded module in 
separate files is better. I believe we can achieve this easily by adding 
a pass that extracts the embedded LLVM-IR code into separate files.
>     Is the motivation for the intrinsic simply to preserve the ability
>     to pipe LLVM commands together on the command-line, e.g. opt | llc?
>       I really feel that the cleaner solution is to split the IR into
>     separate files, each of which can be processed independently after
>     initial generation.
>
> Yes, it is. To preserve such an ability is the main benefit we got from
> this intrinsic. It means we needn't to implement another compiler
driver
> or jit tool for our specific purpose. I agree with you that embedded
> llvm ir harms the readability of the .ll file.
I would like to add that embedding the device IR into the host IR fits 
very well in the LLVM code generation chain. It obviously makes running 
'opt | llc' possible, but it also enables us to write optimizations that
yield embedded GPU code.

To write optimizations that yield embedded GPU code, we also looked into 
three other approaches:

1. Directly create embedded target code (e.g. PTX)

This would mean the optimization pass extracts device code internally 
and directly generate the relevant target code. This approach would 
require our generic optimization pass to be directly linked with the 
specific target back end. This is an ugly layering violation and, in 
addition, it causes major troubles in case the new optimization should 
be dynamically loaded.

2. Extend the LLVM-IR files to support heterogeneous modules

This would mean we extend LLVM-IR, such that IR for different targets
can be stored within a single IR file. This approach could be integrated 
nicely into the LLVM code generation flow and would yield readable 
LLVM-IR even for the device code. However, it adds another level of 
complexity to the LLVM-IR files and does not only require massive 
changes in the LLVM code base, but also in compilers built on top of 
LLVM-IR.

3. Generate two independent LLVM-IR files and pass them around together

The host and device LLVM-IR modules could be kept in separate files. 
This has the benefit of being user readable and not adding additional 
complexity to the LLVM-IR files itself. However, separate files do not 
provide information about how those files are related. Which files are 
kernel files, how.where do they need to be loaded, ...? Also this 
information could probably be put into meta-data or could be hard coded
into the generic compiler infrastructure, but this would require 
significant additional code.
Another weakness of this approach is that the entire LLVM optimization 
chain is currently built under the assumption that a single file/module 
passed around. This is most obvious with the 'opt | llc' idiom, but in 
general every tool that does currently exist would need to be adapted to 
handle multiple files and would possibly even need semantic knowledge 
about how to connect/use them together. Just running clang or
draggonegg with -load GPGPUOptimizer.so would not be possible.

All of the previous approaches require significant changes all over the 
code base and would cause trouble with loadable optimization passes. The 
intrinsic based approach seems to address most of the previous problems.

The intrinsic based approach requires little changes restricted to LLVM 
itself. It especially works without changes to the established LLVM 
optimization chain. 'opt | llc' will work out of the box, but, more 
importantly, any LLVM based compiler can directly load a 
GPGPUOptimzer.so file to gain a GPU based accelerator. Besides the need 
to load some runtime library, no additional knowledge needs to be 
embedded in individual compiler implementations, but all the logic of 
GPGPU code generation can remain within a single LLVM optimization pass. 
Another nice feature of the intrinsic is that the relation between host 
and device code is explicitly encoded in the LLVM-IR (with the 
llvm.codegen function calls). There is no need to put this information 
into individual tools and/or to carry it through meta-data. Instead the 
precise semantics are directly available through LLVM-IR.

Justin: With your proposed two-file approach? What changes would be 
needed to add e.g. GPGPU code generation support to clang/dragonegg or
haskell+LLVM? Can you see a way, this can be done without large changes
to each of these users?
>         The usage of t his intrinsic is not limited to code generation
>         for heterogeneous platform. It can also help lots of (run-time)
>         optimization and security problems even when the code generation
>         target is same as the one of the parent module.
>
>
>     How does this help run-time optimization?
>
> We implement this intrinsic by learning the implementation style of
> llvm's garbage collector related intrinsics which support various GC
> strategies. It can help if the ASMGenerator in the patch is revised to
> be able to accept various optimization strategies provided by the user
> of this intrinsic. Then the intrinsic will do what the user wants to the
> input code string. When running the code with lli like jit tools, we can
> choose one optimization strategy at run-time. Though haven't supported
> this currently, we try to make the design as general as we can. The
> essential functionality of this intrinsic is that we get an input code
> string, transform it into a target-specific new one then replace the
> call to the intrinsic.
There may be uses like this, but I am not sure if the llvm.codegen() 
intrinsic is the best way to implement this. Even though we made it 
generic and it can possibly be used in other ways, I suggest to 
currently focus on the use for heterogeneous computing. This is where it 
is needed today and where we can easily check if it does what we need.
>         Each call to the intrinsic has two arguments. One is the LLVM IR
>         string. The other is the name of the target architecture. When
>         running with tools like llc, lli, etc, this intrinsic transforms
>         the input LLVM IR string  to a new string of assembly code for
>         the target architecture firstly. Then the call to the intrinsic
>         is replaced by a pointer to the newly generated string. After
>         this, we have in our module
>
>
>     Is the Arch parameter to llvm.codegen really needed?  Since codegen
>     happens when lowering the intrinsic, the target architecture must be
>     known.  But if the target architecture is known, then it should be
>     available in the triple for the embedded module.
>
> Yes. It is better that the target data is set correctly in the embedded
> module. It is the user's responsibility to do this.
OK. Why don't we require the triple to be set and remove the arch 
parameter again?

Tobi

Yabin Hu

2012-Apr-28 11:17 UTC

head link

[LLVMdev] [PATCH][RFC] Add llvm.codegen Intrinsic To Support Embedded LLVM IR Code Generation

Hi Tobi,

2012/4/28 Tobias Grosser <tobias at grosser.es>
>        Each call to the intrinsic has two arguments. One is the LLVM IR
>>        string. The other is the name of the target architecture. When
>>        running with tools like llc, lli, etc, this intrinsic transforms
>>        the input LLVM IR string  to a new string of assembly code for
>>        the target architecture firstly. Then the call to the intrinsic
>>        is replaced by a pointer to the newly generated string. After
>>        this, we have in our module
>>
>>
>>    Is the Arch parameter to llvm.codegen really needed?  Since codegen
>>    happens when lowering the intrinsic, the target architecture must be
>>    known.  But if the target architecture is known, then it should be
>>    available in the triple for the embedded module.
>>
>> Yes. It is better that the target data is set correctly in the embedded
>> module. It is the user's responsibility to do this.
>>
>
> OK. Why don't we require the triple to be set and remove the arch
> parameter again?

I am afraid I didn't make it clear in the previous email.  And I am sorry
that I didn't get it when you pointed out this before.

There are two approaches we deal with the triple of the embedded module.
1. The embedded LLVM IR string contains a relatively *complete* module, in
which the target triple is properly set. It means when a user of the
intrinsic generates the embedded LLVM IR string, he need add not only the
function definitions but also the target triple information. When the
intrinsic extract the string into a module, we check whether the triple is
empty. If it is, we return immediately or report errors. In this case, we
needn't the arch parameter.

2. There is no triple information in the embedded LLVM IR string. We get it
from the arch parameter.

With the 1st approach, we avoid some codes about getting the arch string
from arch llvm::Value and generate the triple from the arch string. It
leads less changes to llvm than the 2nd approach. So maybe it is better. We
should add some words to the document that tell the user to set the target
triple info properly in the embedded LLVM IR string.

best regards,
Yabin
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20120428/a6d0724a/attachment.html>

Justin Holewinski

2012-Apr-28 14:30 UTC

head link

[LLVMdev] [PATCH][RFC] Add llvm.codegen Intrinsic To Support Embedded LLVM IR Code Generation

On Sat, Apr 28, 2012 at 3:16 AM, Tobias Grosser <tobias at grosser.es>
wrote:
> On 04/28/2012 10:25 AM, Yabin Hu wrote:
>
>> Hi Justin,
>>
>> Thanks very much for your comments.
>>
>> 2012/4/28 Justin Holewinski <justin.holewinski at gmail.com
>> <mailto:justin.holewinski@**gmail.com <justin.holewinski at
gmail.com>>>
>>
>>
>>    On Fri, Apr 27, 2012 at 7:40 PM, Yabin Hu <yabin.hwu at gmail.com
>>    <mailto:yabin.hwu at gmail.com>> wrote:
>>
>>        The attached patch adds a new Intrinsic named
"llvm.codegen" to
>>        support embedded LLVM IR code generation. The
'llvm.codegen'
>>        intrinsic uses the LLVM back ends to generate code for embedded
>>        LLVM IR strings. The code generation target can be same or
>>        different to the one of the parent module.
>>
>>
>>        The original motivation inspiring us to add this intrinsic, is
>>        to generate code for heterogeneous platform. A test case in the
>>        patch demos this.  In the test case, on a X86 host, we use this
>>        intrinsic to transform an embedded  LLVM IR into a string of PTX
>>        assembly. We can then employ a PTX  execution engine ( on CUDA
>>        Supported GPU ) to execute the newly generated assembly and copy
>>
>>        back the result later.
>>
>>
>>    I have to admit, I'm not sold on this solution.  First, there is
no
>>    clear way to pass codegen flags to the back-end.  In PTX parlance,
>>    how would I embed an .ll file and compile to compute_13?
>>
>> We can handle this by provide a new argument (e.g. a string of
>> properly-configured Target Machine) instead of or in addition to the
>> Arch type string argument.
>>
>
> I think we may in general discuss the additional information needed for
> the back ends and provide the information as parameters. We may want to do
> this on demand, in case we agreed on the general usefulness of this
> intrinsic.

Any solution would need to be able to handle Feature flags (e.g.
-mattr=+sm_20), as well as generic llc options (e.g. -regalloc=greedy).
 What happens when the options conflict with the original options passed to
llc?  The CodeGenIntrinsic pass would need to emulate all (most?) of llc,
but in a way that doesn't interfere with llc's global state.
 Unfortunately, parameters like "regalloc=" are globals.  To do this
without massive LLVM changes, you may need to spawn another instance of llc
as a separate process.

>
>
>     Second, this adds a layer of obfuscation to the system.  If I look
>>    at an .ll file, I expect to see all of the assembly in a reasonably
>>    clean syntax.  If the device code is squashed into a constant array,
>>    it is much harder to read.
>>
>
> I agree with Justin. The embedded code is not readable within the constant
> array. For debugging purposes having the embedded module in separate files
> is better. I believe we can achieve this easily by adding a pass that
> extracts the embedded LLVM-IR code into separate files.
>
>
>     Is the motivation for the intrinsic simply to preserve the ability
>>    to pipe LLVM commands together on the command-line, e.g. opt | llc?
>>      I really feel that the cleaner solution is to split the IR into
>>    separate files, each of which can be processed independently after
>>    initial generation.
>>
>> Yes, it is. To preserve such an ability is the main benefit we got from
>> this intrinsic. It means we needn't to implement another compiler
driver
>> or jit tool for our specific purpose. I agree with you that embedded
>> llvm ir harms the readability of the .ll file.
>>
>
> I would like to add that embedding the device IR into the host IR fits
> very well in the LLVM code generation chain. It obviously makes running
> 'opt | llc' possible, but it also enables us to write optimizations
that
> yield embedded GPU code.
>
> To write optimizations that yield embedded GPU code, we also looked into
> three other approaches:
>
> 1. Directly create embedded target code (e.g. PTX)
>
> This would mean the optimization pass extracts device code internally and
> directly generate the relevant target code. This approach would require our
> generic optimization pass to be directly linked with the specific target
> back end. This is an ugly layering violation and, in addition, it causes
> major troubles in case the new optimization should be dynamically loaded.
>
I agree that this isn't desirable.  The optimizer should never have to
generate device code.

>
> 2. Extend the LLVM-IR files to support heterogeneous modules
>
> This would mean we extend LLVM-IR, such that IR for different targets
> can be stored within a single IR file. This approach could be integrated
> nicely into the LLVM code generation flow and would yield readable LLVM-IR
> even for the device code. However, it adds another level of complexity to
> the LLVM-IR files and does not only require massive changes in the LLVM
> code base, but also in compilers built on top of LLVM-IR.
>
> 3. Generate two independent LLVM-IR files and pass them around together
>
> The host and device LLVM-IR modules could be kept in separate files. This
> has the benefit of being user readable and not adding additional complexity
> to the LLVM-IR files itself. However, separate files do not provide
> information about how those files are related. Which files are kernel
> files, how.where do they need to be loaded, ...? Also this information
> could probably be put into meta-data or could be hard coded
> into the generic compiler infrastructure, but this would require
> significant additional code.
> Another weakness of this approach is that the entire LLVM optimization
> chain is currently built under the assumption that a single file/module
> passed around. This is most obvious with the 'opt | llc' idiom, but
in
> general every tool that does currently exist would need to be adapted to
> handle multiple files and would possibly even need semantic knowledge about
> how to connect/use them together. Just running clang or
> draggonegg with -load GPGPUOptimizer.so would not be possible.
>
> All of the previous approaches require significant changes all over the
> code base and would cause trouble with loadable optimization passes. The
> intrinsic based approach seems to address most of the previous problems.
>
> The intrinsic based approach requires little changes restricted to LLVM
> itself. It especially works without changes to the established LLVM
> optimization chain. 'opt | llc' will work out of the box, but, more
> importantly, any LLVM based compiler can directly load a GPGPUOptimzer.so
> file to gain a GPU based accelerator. Besides the need to load some runtime
> library, no additional knowledge needs to be embedded in individual
> compiler implementations, but all the logic of GPGPU code generation can
> remain within a single LLVM optimization pass. Another nice feature of the
> intrinsic is that the relation between host and device code is explicitly
> encoded in the LLVM-IR (with the llvm.codegen function calls). There is no
> need to put this information into individual tools and/or to carry it
> through meta-data. Instead the precise semantics are directly available
> through LLVM-IR.
>
I just worry about the scalability of this approach.  Once you embed the
IR, no optimizer can touch it, so this potentially creates problems with
pass scheduling.  When you generate the IR, you want it to be fully
optimized before embedding.  Or, you could invoke opt+llc when lowering the
llvm.codegen intrinsic.

>
> Justin: With your proposed two-file approach? What changes would be needed
> to add e.g. GPGPU code generation support to clang/dragonegg or
> haskell+LLVM? Can you see a way, this can be done without large changes
> to each of these users?
>
To be fair, I'm not necessarily advocating the two-file approach.  It has
its shortcomings, too.  But this is in some sense the crux of the problem.
 The intrinsic approach is clearly the path of least resistance, especially
in the case of the GSoC project.  However, I think a more long-term
solution involves looking at this problem from the IR level.  The current
LLVM approach is "one arch in, one arch out".  As far as I know, even
ARM
needs separate modules for ARM vs. Thumb (please correct me if I'm
mistaken).  Whether the tools are extended to support multiple outputs with
some linking information or the IR is extended to support something like
per-function target triples, that is a decision that would need to be
addressed by the entire LLVM community.

>         The usage of t his intrinsic is not limited to code generation
>>
>>        for heterogeneous platform. It can also help lots of (run-time)
>>        optimization and security problems even when the code generation
>>        target is same as the one of the parent module.
>>
>>
>>    How does this help run-time optimization?
>>
>> We implement this intrinsic by learning the implementation style of
>> llvm's garbage collector related intrinsics which support various
GC
>> strategies. It can help if the ASMGenerator in the patch is revised to
>> be able to accept various optimization strategies provided by the user
>> of this intrinsic. Then the intrinsic will do what the user wants to
the
>> input code string. When running the code with lli like jit tools, we
can
>> choose one optimization strategy at run-time. Though haven't
supported
>> this currently, we try to make the design as general as we can. The
>> essential functionality of this intrinsic is that we get an input code
>> string, transform it into a target-specific new one then replace the
>> call to the intrinsic.
>>
>
> There may be uses like this, but I am not sure if the llvm.codegen()
> intrinsic is the best way to implement this. Even though we made it generic
> and it can possibly be used in other ways, I suggest to currently focus on
> the use for heterogeneous computing. This is where it is needed today and
> where we can easily check if it does what we need.
>
>
>         Each call to the intrinsic has two arguments. One is the LLVM IR
>>        string. The other is the name of the target architecture. When
>>        running with tools like llc, lli, etc, this intrinsic transforms
>>        the input LLVM IR string  to a new string of assembly code for
>>        the target architecture firstly. Then the call to the intrinsic
>>        is replaced by a pointer to the newly generated string. After
>>        this, we have in our module
>>
>>
>>    Is the Arch parameter to llvm.codegen really needed?  Since codegen
>>    happens when lowering the intrinsic, the target architecture must be
>>    known.  But if the target architecture is known, then it should be
>>    available in the triple for the embedded module.
>>
>> Yes. It is better that the target data is set correctly in the embedded
>> module. It is the user's responsibility to do this.
>>
>
> OK. Why don't we require the triple to be set and remove the arch
> parameter again?
>
> Tobi
>


-- 

Thanks,

Justin Holewinski
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20120428/7f16f175/attachment.html>

dag at cray.com

2012-Apr-30 19:55 UTC

head link

[LLVMdev] [PATCH][RFC] Add llvm.codegen Intrinsic To Support Embedded LLVM IR Code Generation

Tobias Grosser <tobias at grosser.es> writes:
> To write optimizations that yield embedded GPU code, we also looked into 
> three other approaches:
>
> 1. Directly create embedded target code (e.g. PTX)
>
> This would mean the optimization pass extracts device code internally 
> and directly generate the relevant target code. This approach would 
> require our generic optimization pass to be directly linked with the 
> specific target back end. This is an ugly layering violation and, in 
> addition, it causes major troubles in case the new optimization should 
> be dynamically loaded.
IMHO it's a bit unrealistic to have a target-independent optimization
layer.  Almost all optimization wants to know target details at some
point.  I think we can and probably should support that.  We can allow
passes to gracefully fall back in the cases where target information is
not available.
> 2. Extend the LLVM-IR files to support heterogeneous modules
>
> This would mean we extend LLVM-IR, such that IR for different targets
> can be stored within a single IR file. This approach could be integrated 
> nicely into the LLVM code generation flow and would yield readable 
> LLVM-IR even for the device code. However, it adds another level of 
> complexity to the LLVM-IR files and does not only require massive 
> changes in the LLVM code base, but also in compilers built on top of 
> LLVM-IR.
I don't think the code base changes are all that bad.  We have a number
of them to support generating code one function at a time rather than a
whole module together.  They've been sitting around waiting for us to
send them upstream.  It would be an easy matter to simply annotate each
function with its target.  We don't currently do that because we never
write out such IR files but it seems like a simple problem to solve to
me.
> 3. Generate two independent LLVM-IR files and pass them around together
>
> The host and device LLVM-IR modules could be kept in separate files. 
> This has the benefit of being user readable and not adding additional 
> complexity to the LLVM-IR files itself. However, separate files do not 
> provide information about how those files are related. Which files are 
> kernel files, how.where do they need to be loaded, ...? Also this 
> information could probably be put into meta-data or could be hard coded
> into the generic compiler infrastructure, but this would require 
> significant additional code.
I don't think metadata would work because it would not satisfy the "no
semantic effects" requirement.  We couldn't just drop the metadata and
expect things to work.
> Another weakness of this approach is that the entire LLVM optimization 
> chain is currently built under the assumption that a single file/module 
> passed around. This is most obvious with the 'opt | llc' idiom, but
in
> general every tool that does currently exist would need to be adapted to 
> handle multiple files and would possibly even need semantic knowledge 
> about how to connect/use them together. Just running clang or
> draggonegg with -load GPGPUOptimizer.so would not be possible.
Again, we have many of the changes to make this possible.  I hope to
send them for review as we upgrade to 3.1.
> All of the previous approaches require significant changes all over the 
> code base and would cause trouble with loadable optimization passes. The 
> intrinsic based approach seems to address most of the previous problems.
I'm pretty uncomfortable with the proposed intrinsic.  It feels
tacked-on and not in the LLVM spirit.  We should be able to extend the
IR to support multiple targets.  We're going to need this kind of
support for much more than GPUs in thefuture.  Heterogenous computing is
here to stay.

                             -Dave

Possibly Parallel Threads

Search for more apparently analagous threads

llvm dev - Apr 2012 - [LLVMdev] [PATCH][RFC] Add llvm.codegen Intrinsic To Support Embedded LLVM IR Code Generation

[LLVMdev] [PATCH][RFC] Add llvm.codegen Intrinsic To Support Embedded LLVM IR Code Generation

[LLVMdev] [PATCH][RFC] Add llvm.codegen Intrinsic To Support Embedded LLVM IR Code Generation

[LLVMdev] [PATCH][RFC] Add llvm.codegen Intrinsic To Support Embedded LLVM IR Code Generation

[LLVMdev] [PATCH][RFC] Add llvm.codegen Intrinsic To Support Embedded LLVM IR Code Generation

[LLVMdev] [PATCH][RFC] Add llvm.codegen Intrinsic To Support Embedded LLVM IR Code Generation

Possibly Parallel Threads