Hi all, Intel has just revealed its AVX2 instruction set, to be supported by the 2013 Haswell architecture, and it's looking quite revolutionary: http://software.intel.com/en-us/forums/showthread.php?t=83399 <http://software.intel.com/en-us/forums/showthread.php?t=83399&o=a&s=lr> &o=a&s=lr It includes powerful 'gather' instructions, which allow reading multiple vector elements from non-contiguous memory locations. It also extends all integer vector instructions to 256-bit, and can shift vector elements by independent counts. This offers tremendous opportunity for auto-vectorizing loops, since practically every scalar operation will have a direct vector equivalent. It also facilitates implementing throughput computing languages like OpenCL. So I was wondering whether in LLVM a gather operation is best represented with a 'load' instruction taking vector operands, or whether it's better to define it as a separate 'gather' instruction. What would be the pros and cons of each approach, and what do you think should be the long-term goals for the LLVM instruction set? Cheers, Nicolas -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110613/68fa5f7e/attachment.html>
The important thing IMO, is to not represent the gather operation as an instruction which takes a vector of pointers, because that's too restrictive for architectures with 64bits pointers. What one most frequently wants to do in those architectures is to specify a 64bit scalar base pointer with a vector of 32bit offsets. This fits what the VGATHERxxx described in the spec provides, and this fits what e.g. we need for efficient texture lookups in software rendering. Whether this is represented by extending the load LLVM instruction to take "load pointer, offset", or by using a new "gather pointer, offset" LLVM instruction is a lesser issue IMO. Jose ----- Original Message -----> Hi all,> Intel has just revealed its AVX2 instruction set, to be supported by > the 2013 Haswell architecture, and it's looking quite revolutionary: > http://software.intel.com/en-us/forums/showthread.php?t=83399&o=a&s=lr> It includes powerful 'gather' instructions, which allow reading > multiple vector elements from non-contiguous memory locations. It > also extends all integer vector instructions to 256-bit, and can > shift vector elements by independent counts. This offers tremendous > opportunity for auto-vectorizing loops, since practically every > scalar operation will have a direct vector equivalent. It also > facilitates implementing throughput computing languages like OpenCL.> So I was wondering whether in LLVM a gather operation is best > represented with a 'load' instruction taking vector operands, or > whether it's better to define it as a separate 'gather' instruction. > What would be the pros and cons of each approach, and what do you > think should be the long-term goals for the LLVM instruction set?> Cheers, > Nicolas > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110613/09853cd3/attachment.html>
Jose Fonseca <jfonseca at vmware.com> writes:> The important thing IMO, is to not represent the gather operation as > an instruction which takes a vector of pointers, because that's too > restrictive for architectures with 64bits pointers.How is it restrictive?> What one most frequently wants to do in those architectures is to specify a > 64bit scalar base pointer with a vector of 32bit offsets.Or 64-bit offsets. We should not restrict offsets to 32 bits. -Dave
"Nicolas Capens" <nicolas.capens at gmail.com> writes:> Hi all, > > Intel has just revealed its AVX2 instruction set, to be supported by the 2013 Haswell architecture, and it's looking quite > revolutionary: http://software.intel.com/en-us/forums/showthread.php?t=83399&o=a&s=lrHooray! But boo! No 64-bit integer multiply yet. :( In any case, once I get the major AVX changes in, it should be almost trivial to add HNI support. I've got some more patches ready to go in ASAP. We're very close to sending "the big one" up. :) -Dave