Hi all, I'm currently building a DSL for a computer graphics project that is not unlike NVIDIA's Cg. I have an intrinsic with the following signature float4 sample(texture tex, float2 coords); that is translated to this LLVM IR code: declare void @"sample"(%float4* noalias nocapture sret, %texture, $float2) nounwind readnone The type float4 is basically an array of four floats, which cannot be returned directly on an x86 using the traditional calling conventions but only via the sret mechanism. You might already have spotted that "readnone" attribute, which is causing some problems: The GVN optimization pass seems to treat the sret pointer just like any other pointer to memory and eliminates all calls to the function, since it sees it as returning void without touching any memory. Is there a way to make sure that the GVN pass interpretes the sret argument as the actual return value of the function? Or are there other approaches I could try? Currently, the only way to make sure that the sample function behaves as expected is to drop the "readnone" attribute, but that obviously hinders optimization ... Thanks a lot, Stephan
On Mon, Oct 5, 2009 at 9:21 AM, Stephan Reiter <stephan.reiter at gmail.com> wrote:> Hi all, > > I'm currently building a DSL for a computer graphics project that is > not unlike NVIDIA's Cg. I have an intrinsic with the following > signature > > float4 sample(texture tex, float2 coords); > > that is translated to this LLVM IR code: > > declare void @"sample"(%float4* noalias nocapture sret, %texture, > $float2) nounwind readnone > > The type float4 is basically an array of four floats, which cannot be > returned directly on an x86 using the traditional calling conventions > but only via the sret mechanism. > > You might already have spotted that "readnone" attribute, which is > causing some problems: The GVN optimization pass seems to treat the > sret pointer just like any other pointer to memory and eliminates all > calls to the function, since it sees it as returning void without > touching any memory. Is there a way to make sure that the GVN pass > interpretes the sret argument as the actual return value of the > function? Or are there other approaches I could try? > > Currently, the only way to make sure that the sample function behaves > as expected is to drop the "readnone" attribute, but that obviously > hinders optimization ... > > Thanks a lot, > StephanI believe you are out of luck for the time being. I plan to change the codegen stage so that it handles large struct returns; then you could declare your function to return the four floats directly and mark it readnone. But I don't have a target date for that change.
Hi Stephan,> You might already have spotted that "readnone" attribute, which is > causing some problems: The GVN optimization pass seems to treat the > sret pointer just like any other pointer to memory and eliminates all > calls to the function, since it sees it as returning void without > touching any memory.as explained in the language reference, http://llvm.org/docs/LangRef.html, readonly functions must not write to any byval arguments. The reason for this is that it allows the inliner to avoid introducing a temporary variable and copy when inlining readonly functions with a byval argument. Is there a way to make sure that the GVN pass> interpretes the sret argument as the actual return value of the > function? Or are there other approaches I could try?Not for the moment, sorry. Ciao, Duncan.
On Oct 5, 2009, at 7:21 AM, Stephan Reiter wrote:> Hi all, > > I'm currently building a DSL for a computer graphics project that is > not unlike NVIDIA's Cg. I have an intrinsic with the following > signature > > float4 sample(texture tex, float2 coords); > > that is translated to this LLVM IR code: > > declare void @"sample"(%float4* noalias nocapture sret, %texture, > $float2) nounwind readnoneThe best thing to do to handle this is to add a custom AliasAnalysis implementation, which will know the precise mod/ref sets for the function. See docs/AliasAnalysis.html for some more information. -Chris> > The type float4 is basically an array of four floats, which cannot be > returned directly on an x86 using the traditional calling conventions > but only via the sret mechanism. > > You might already have spotted that "readnone" attribute, which is > causing some problems: The GVN optimization pass seems to treat the > sret pointer just like any other pointer to memory and eliminates all > calls to the function, since it sees it as returning void without > touching any memory. Is there a way to make sure that the GVN pass > interpretes the sret argument as the actual return value of the > function? Or are there other approaches I could try? > > Currently, the only way to make sure that the sample function behaves > as expected is to drop the "readnone" attribute, but that obviously > hinders optimization ... > > Thanks a lot, > Stephan > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
On Oct 5, 2009, at 7:21 AM, Stephan Reiter wrote:> Hi all, > > I'm currently building a DSL for a computer graphics project that is > not unlike NVIDIA's Cg. I have an intrinsic with the following > signature > > float4 sample(texture tex, float2 coords); > > that is translated to this LLVM IR code: > > declare void @"sample"(%float4* noalias nocapture sret, %texture, > $float2) nounwind readnone > > The type float4 is basically an array of four floats, which cannot be > returned directly on an x86 using the traditional calling conventions > but only via the sret mechanism.Is there a reason it needs to be an array? A vector of four floats wouldn't have this problem, if that's an option. Dan
On 5 Okt., 23:33, Dan Gohman <goh... at apple.com> wrote:> > Is there a reason it needs to be an array? A vector of four floats > wouldn't have this problem, if that's an option. >Unfortunately that's not an option. At the moment I'm restricting myself to the use of scalar code only, in order to be able to vectorize the code easily later (e.g., float4 as it is now will then become an array of four vectors for parallel processing of n (probably 4, SSE) pixels). But thanks for coming up with this idea! Chris, I'll take a look at the AliasAnalysis functionality. Depending on how much effort it is to implement a solution I might follow this approach. If not, there's still Kenneth's new code generator to look forward to. :) Thanks, Stephan
Possibly Parallel Threads
- [LLVMdev] Functions: sret and readnone
- [LLVMdev] Functions: sret and readnone
- [LLVMdev] Functions: sret and readnone
- [LLVMdev] Functions: sret and readnone
- [LLVMdev] [NVPTX] llc -march=nvptx64 -mcpu=sm_20 generates invalid zero align for device function params