Justin Holewinski
2012-Apr-28  23:21 UTC
[LLVMdev] [PATCH][RFC] Add llvm.codegen Intrinsic To Support Embedded LLVM IR Code Generation
On Sat, Apr 28, 2012 at 8:27 AM, Tobias Grosser <tobias at grosser.es> wrote:> On 04/28/2012 04:30 PM, Justin Holewinski wrote: > >> We can handle this by provide a new argument (e.g. a string of >> properly-configured Target Machine) instead of or in addition to >> the >> Arch type string argument. >> >> >> I think we may in general discuss the additional information needed >> for the back ends and provide the information as parameters. We may >> want to do this on demand, in case we agreed on the general >> usefulness of this intrinsic. >> >> >> Any solution would need to be able to handle Feature flags (e.g. >> -mattr=+sm_20), as well as generic llc options (e.g. -regalloc=greedy). >> What happens when the options conflict with the original options >> passed to llc? The CodeGenIntrinsic pass would need to emulate all >> (most?) of llc, but in a way that doesn't interfere with llc's global >> state. Unfortunately, parameters like "regalloc=" are globals. To do >> this without massive LLVM changes, you may need to spawn another >> instance of llc as a separate process. >> > > I think feature flags should not be a problem. The function > createTargetMachine() takes a feature string. We can get this string as a > parameter of the intrinsic and use it to parametrize the target machine. If > needed, we can also add parameters to define the relocation model, mcpu, > the code model, the optimization level or the target options. All those > parameters are not influenced by the command line options of the llc > invocation and will, for now, be set to default values for the embedded > code generation. > > We should probably add the most important options now and add others on > demand. Which are the options you suggest to be added initially? I suppose > we need 1) the feature string and 2) mcpu. Is there anything else you would > suggest? > > regalloc= is different. It is global and consequently influences both host > and device code generation. However, to me it is rather a debugging option. > It is never set by clang and targets provide a reasonable default based on > the optimization level. I believe we can > assume that for our use case it is not set. In case it is really necessary > to explicitly set the register allocator, the right solution would be to > make regalloc a target option.The regalloc= option was just an example of the types of flags that can be passed to llc, which are handled as global options instead of target options.> > > The intrinsic based approach requires little changes restricted to >> LLVM itself. It especially works without changes to the established >> LLVM optimization chain. 'opt | llc' will work out of the box, but, >> more importantly, any LLVM based compiler can directly load a >> GPGPUOptimzer.so file to gain a GPU based accelerator. Besides the >> need to load some runtime library, no additional knowledge needs to >> be embedded in individual compiler implementations, but all the >> logic of GPGPU code generation can remain within a single LLVM >> optimization pass. Another nice feature of the intrinsic is that the >> relation between host and device code is explicitly encoded in the >> LLVM-IR (with the llvm.codegen function calls). There is no need to >> put this information into individual tools and/or to carry it >> through meta-data. Instead the precise semantics are directly >> available through LLVM-IR. >> >> >> I just worry about the scalability of this approach. Once you embed the >> IR, no optimizer can touch it, so this potentially creates problems with >> pass scheduling. When you generate the IR, you want it to be fully >> optimized before embedding. Or, you could invoke opt+llc when lowering >> the llvm.codegen intrinsic. >> > > Where do you see scalability problems? > > I agree that the llvm.codegen intrinsic is limited to plain code > generation. Meaning it is an embedded llc. I do not expect any part of LLVM > to be extended to reason about optimizing the embedded IR. The optimization > that created this intrinsic is in charge of optimizing the embedded IR as > needed. However, this is not a big problem. A generic LLVM-IR optimization > pass can schedule the required optimizations as needed.The implicit assumption seems to be that the host code wants the device code as assembly text. What happens when you need to link the device binary and upload it separately? Think automatic SPU codegen on Cell. Is it up to the host program to invoke the other target's linker?> > > Justin: With your proposed two-file approach? What changes would be >> needed to add e.g. GPGPU code generation support to clang/dragonegg or >> haskell+LLVM? Can you see a way, this can be done without large changes >> to each of these users? >> >> >> To be fair, I'm not necessarily advocating the two-file approach. It >> has its shortcomings, too. But this is in some sense the crux of the >> problem. The intrinsic approach is clearly the path of least >> resistance, especially in the case of the GSoC project. However, I >> think a more long-term solution involves looking at this problem from >> the IR level. The current LLVM approach is "one arch in, one arch out". >> As far as I know, even ARM needs separate modules for ARM vs. Thumb >> (please correct me if I'm mistaken). Whether the tools are extended to >> support multiple outputs with some linking information or the IR is >> extended to support something like per-function target triples, that is >> a decision that would need to be addressed by the entire LLVM community. >> > > I agree that future work can be useful here. However, before spending a > large amount of time to engineer a complex solution, I propose to start > with the proposed light-weight approach. It is sufficient for our needs and > will allow us to get the experience and infrastructure that can help us to > choose and implement a more complex later on. >I agree that this approach is the best way to get short-term results, especially for the GSoC project.> > Tobi > >-- Thanks, Justin Holewinski -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20120428/b98be3f3/attachment.html>
Tobias Grosser
2012-Apr-29  13:37 UTC
[LLVMdev] [PATCH][RFC] Add llvm.codegen Intrinsic To Support Embedded LLVM IR Code Generation
On 04/29/2012 01:21 AM, Justin Holewinski wrote:> > > On Sat, Apr 28, 2012 at 8:27 AM, Tobias Grosser <tobias at grosser.es > <mailto:tobias at grosser.es>> wrote: > regalloc= is different. It is global and consequently influences > both host and device code generation. However, to me it is rather a > debugging option. It is never set by clang and targets provide a > reasonable default based on the optimization level. I believe we can > assume that for our use case it is not set. In case it is really > necessary to explicitly set the register allocator, the right > solution would be to make regalloc a target option. > > > The regalloc= option was just an example of the types of flags that can > be passed to llc, which are handled as global options instead of target > options.Yes, thanks for pointing us to this problem. For now I think we can ignore them as they are mostly debugging options and they can be included in the target options if needed.> The implicit assumption seems to be that the host code wants the device > code as assembly text. What happens when you need to link the device > binary and upload it separately? Think automatic SPU codegen on Cell. > Is it up to the host program to invoke the other target's linker?OK, I get what you mean. The intrinsic is currently targeted at the OpenCL/CUDA model. It is the most widely used. Stuff like cell sounds interesting, but probably needs further thoughts. Even with OpenCL/CUDA, this intrinsic works currently only for PTX code generation, but I hope we can gain support for other GPU devices later on.> I agree that future work can be useful here. However, before > spending a large amount of time to engineer a complex solution, I > propose to start with the proposed light-weight approach. It is > sufficient for our needs and will allow us to get the experience and > infrastructure that can help us to choose and implement a more > complex later on. > > > I agree that this approach is the best way to get short-term results, > especially for the GSoC project.OK, let's go ahead. Yabin, can you update the patch with the following changes: - Remove the Arch flag - Document that we require a triple - Add two new arguments that take a feature string and a mcpu flag (can be set to "", which means we use the default) Cheers Tobi
张 媛媛
2012-Apr-29  14:26 UTC
[LLVMdev] [PATCH][RFC] Add llvm.codegen Intrinsic To Support Embedded LLVM IR Code Generation
Hi , 在 2012-4-29,下午9:37, Tobias Grosser 写道:> On 04/29/2012 01:21 AM, Justin Holewinski wrote: >> >> >> On Sat, Apr 28, 2012 at 8:27 AM, Tobias Grosser <tobias at grosser.es >> <mailto:tobias at grosser.es>> wrote: >> regalloc= is different. It is global and consequently influences >> both host and device code generation. However, to me it is rather a >> debugging option. It is never set by clang and targets provide a >> reasonable default based on the optimization level. I believe we can >> assume that for our use case it is not set. In case it is really >> necessary to explicitly set the register allocator, the right >> solution would be to make regalloc a target option. >> >> >> The regalloc= option was just an example of the types of flags that can >> be passed to llc, which are handled as global options instead of target >> options. > > Yes, thanks for pointing us to this problem. For now I think we can ignore them as they are mostly debugging options and they can be included in the target options if needed. > >> The implicit assumption seems to be that the host code wants the device >> code as assembly text. What happens when you need to link the device >> binary and upload it separately? Think automatic SPU codegen on Cell. >> Is it up to the host program to invoke the other target's linker? > > OK, I get what you mean. The intrinsic is currently targeted at the OpenCL/CUDA model. It is the most widely used. Stuff like cell sounds interesting, but probably needs further thoughts. Even with OpenCL/CUDA, > this intrinsic works currently only for PTX code generation, but I hope we can gain support for other GPU devices later on. > >> I agree that future work can be useful here. However, before >> spending a large amount of time to engineer a complex solution, I >> propose to start with the proposed light-weight approach. It is >> sufficient for our needs and will allow us to get the experience and >> infrastructure that can help us to choose and implement a more >> complex later on. >> >> >> I agree that this approach is the best way to get short-term results, >> especially for the GSoC project. > > OK, let's go ahead. > > Yabin, can you update the patch with the following changes: > > - Remove the Arch flag > - Document that we require a triple > - Add two new arguments that take a feature string and a mcpu > flag (can be set to "", which means we use the default)OK. I will do that. Thanks for all your comments. best, Yabin
Evan Cheng
2012-Apr-29  18:28 UTC
[LLVMdev] [PATCH][RFC] Add llvm.codegen Intrinsic To Support Embedded LLVM IR Code Generation
On Apr 29, 2012, at 6:37 AM, Tobias Grosser wrote:> > OK, I get what you mean. The intrinsic is currently targeted at the > OpenCL/CUDA model. It is the most widely used. Stuff like cell sounds > interesting, but probably needs further thoughts. Even with OpenCL/CUDA, > this intrinsic works currently only for PTX code generation, but I hope > we can gain support for other GPU devices later on. > >> I agree that future work can be useful here. However, before >> spending a large amount of time to engineer a complex solution, I >> propose to start with the proposed light-weight approach. It is >> sufficient for our needs and will allow us to get the experience and >> infrastructure that can help us to choose and implement a more >> complex later on. >> >> >> I agree that this approach is the best way to get short-term results, >> especially for the GSoC project. > > OK, let's go ahead. > > Yabin, can you update the patch with the following changes: > > - Remove the Arch flag > - Document that we require a triple > - Add two new arguments that take a feature string and a mcpu > flag (can be set to "", which means we use the default)Wait. I don't think there is enough justification for this to move forward. Apart from the technical issues that have already been raised. I can also see this introduces a safety issue since the embedded IR code is not checked / verified at compile time. Unless Chris says otherwise, I don't see this patch being accepted on trunk. Evan> > Cheers > Tobi > > > > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reasonably Related Threads
- [LLVMdev] [PATCH][RFC] Add llvm.codegen Intrinsic To Support Embedded LLVM IR Code Generation
- [LLVMdev] [PATCH][RFC] Add llvm.codegen Intrinsic To Support Embedded LLVM IR Code Generation
- [LLVMdev] [PATCH][RFC] Add llvm.codegen Intrinsic To Support Embedded LLVM IR Code Generation
- [LLVMdev] [PATCH][RFC] Add llvm.codegen Intrinsic To Support Embedded LLVM IR Code Generation
- [LLVMdev] [PATCH][RFC] Add llvm.codegen Intrinsic To Support Embedded LLVM IR Code Generation