Stanislav Manilov
2013-Oct-02 08:54 UTC
[LLVMdev] Implementing the ARM NEON Intrinsics for PowerPC
Hello Hal, I am not very familiar with the DSP capabilities of PowerPC, but I imagine there will be instructions for simple vector operations like vector addition, multiplication, etc. so for these I imagine the implementation would consist of just outputting the correct instruction. However, for NEON instructions like the reciprocal step (see infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0489c/CIHDIACI.html) it is unlikely that there is a corresponding PowerPC vector instruction, so these will need to be emulated, yes. - Stan On 2 October 2013 04:14, Hal Finkel <hfinkel at anl.gov> wrote:> Stan, > > Do you mean that you want to emulate the ARM NEON intrinsics on PowerPC? > > -Hal > > ----- Original Message ----- > > > > > > Hello LLVM Devs, > > > > > > Thanks for helping me previously to cross-compile for ARM, I managed > > to get a working toolchain and am currently having fun compiling > > different toy problems and running them on a pandaboard. > > > > As part of my research I am trying to implement the ARM NEON > > Intrinsics in the PowerPC LLVM backend. I am still at the beginning > > of my efforts and am not yet familiar with either the ARM or the > > PowerPC backends. After I started investigating the code and found > > out that in total it is more than 100 kloc for the two backends I > > thought it is a good idea to ask you for some hints of where I > > should start from. > > > > I have written a small unrelated experimental backend for LLVM > > before, so I have some experience with the topic. > > > > > > Thanks, > > - Stan > > _______________________________________________ > > LLVM Developers mailing list > > LLVMdev at cs.uiuc.edu llvm.cs.uiuc.edu > > lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > > > -- > Hal Finkel > Assistant Computational Scientist > Leadership Computing Facility > Argonne National Laboratory >-------------- next part -------------- An HTML attachment was scrubbed... URL: <lists.llvm.org/pipermail/llvm-dev/attachments/20131002/3bf4cac4/attachment.html>
Steven Newbury
2013-Oct-02 09:12 UTC
[LLVMdev] Implementing the ARM NEON Intrinsics for PowerPC
How does this make any sense? NEON intrinsics are there to support code generation targeting the ARM NEON SIMD unit on the ARM architecture. Power/PowerPC as it's own AltiVec/VSX SIMD units, which in turn has it's own intrinsics. If you want write code that explicitly targets CPU execution units it's necessarily tied to that specific CPU architecture. If you just want to test code for written for a different CPU on a development box your best bet is to use a VM like QEMU with CPU emulation. If you want to write code that will take advantage of whatever SIMD hardware is available you might want to try abstracting your implementation and use one of the many libraries which provide a higher level API to SIMD optimized functionality. On Wed, 2013-10-02 at 09:54 +0100, Stanislav Manilov wrote:> Hello Hal, > > I am not very familiar with the DSP capabilities of PowerPC, but I imagine > there will be instructions for simple vector operations like vector > addition, multiplication, etc. so for these I imagine the implementation > would consist of just outputting the correct instruction. However, for NEON > instructions like the reciprocal step (see > infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0489c/CIHDIACI.html) > it > is unlikely that there is a corresponding PowerPC vector instruction, so > these will need to be emulated, yes. > > - Stan > > > On 2 October 2013 04:14, Hal Finkel <hfinkel at anl.gov> wrote: > > > Stan, > > > > Do you mean that you want to emulate the ARM NEON intrinsics on PowerPC? > > > > -Hal > > > > ----- Original Message ----- > > > > > > > > > Hello LLVM Devs, > > > > > > > > > Thanks for helping me previously to cross-compile for ARM, I managed > > > to get a working toolchain and am currently having fun compiling > > > different toy problems and running them on a pandaboard. > > > > > > As part of my research I am trying to implement the ARM NEON > > > Intrinsics in the PowerPC LLVM backend. I am still at the beginning > > > of my efforts and am not yet familiar with either the ARM or the > > > PowerPC backends. After I started investigating the code and found > > > out that in total it is more than 100 kloc for the two backends I > > > thought it is a good idea to ask you for some hints of where I > > > should start from. > > > > > > I have written a small unrelated experimental backend for LLVM > > > before, so I have some experience with the topic. > > > > > > > > > Thanks, > > > - Stan > > > _______________________________________________ > > > LLVM Developers mailing list > > > LLVMdev at cs.uiuc.edu llvm.cs.uiuc.edu > > > lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > > > > > > -- > > Hal Finkel > > Assistant Computational Scientist > > Leadership Computing Facility > > Argonne National Laboratory > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu llvm.cs.uiuc.edu > lists.cs.uiuc.edu/mailman/listinfo/llvmdev
David Tweed
2013-Oct-02 10:40 UTC
[LLVMdev] Implementing the ARM NEON Intrinsics for PowerPC
(Note: these are personal opinions rather than anything from my employer.) Although unusual, there might be circumstances in which it would make sense. | If you want write code that explicitly targets CPU execution units it's | necessarily tied to that specific CPU architecture. If you just want to | test code for written for a different CPU on a development box your best | bet is to use a VM like QEMU with CPU emulation. It's possible to have either already written code to analyse, or be intending to write code that will eventually be deployed on a particular mobile architecture but wish to develop that on a desktop machine. Using an architectural simulation will potentially incur more of a cost than implementing as much optimization of the emulation via compiler transformation at compile time. (Whether this is actually enough all the work of writing an LLVM backend is another question of course.) Cheers, Dave
Konstantin Tokarev
2013-Oct-02 10:57 UTC
[LLVMdev] Implementing the ARM NEON Intrinsics for PowerPC
02.10.2013, 14:46, "Steven Newbury" <steve at snewbury.org.uk>:> How does this make any sense? NEON intrinsics are there to support code > generation targeting the ARM NEON SIMD unit on the ARM architecture. > Power/PowerPC as it's own AltiVec/VSX SIMD units, which in turn has it's > own intrinsics. > > If you want write code that explicitly targets CPU execution units it's > necessarily tied to that specific CPU architecture. If you just want to > test code for written for a different CPU on a development box your best > bet is to use a VM like QEMU with CPU emulation. > > If you want to write code that will take advantage of whatever SIMD > hardware is available you might want to try abstracting your > implementation and use one of the many libraries which provide a higher > level API to SIMD optimized functionality.For example, Eigen library [1] supports both AltiVec and NEON. [1] eigen.tuxfamily.org -- Regards, Konstantin
Konstantin Tokarev
2013-Oct-02 11:14 UTC
[LLVMdev] Implementing the ARM NEON Intrinsics for PowerPC
02.10.2013, 13:27, "Stanislav Manilov" <stanislav.manilov at gmail.com>:> Hello Hal, > > I am not very familiar with the DSP capabilities of PowerPC, but I imagine there will be instructions for simple vector operations like vector addition, multiplication, etc. so for these I imagine the implementation would consist of just outputting the correct instruction. However, for NEON instructions like the reciprocal step (see infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0489c/CIHDIACI.html) it is unlikely that there is a corresponding PowerPC vector instruction, so these will need to be emulated, yes.Here is an example implementation of reciprocal square root with AltiVec intinsics: web.archive.org/web/20090810124308/http://developer.apple.com/hardwaredrivers/ve/algorithms.html -- Regards, Konstantin
Renato Golin
2013-Oct-02 11:17 UTC
[LLVMdev] Implementing the ARM NEON Intrinsics for PowerPC
On 2 October 2013 10:12, Steven Newbury <steve at snewbury.org.uk> wrote:> How does this make any sense? >I have to agree with you that this doesn't make much sense, but there is a case where you would want something like that: when the original source uses NEON intrinsics, and there is no alternative in AltiVec, AVX or even plain C. We encourage people to use NEON intrinsics, as opposed to writing inline NEON assembly, when the compiler cannot vectorize your code properly. This may fix the current problem of under-performing forward-incompatible inline asm, and it does solve the portability issue across ARM sub-architectures (ex. v7 vs v8), but it doesn't help on portability across entirely different architectures. Since it's not easy to vectorize every code, and not desired to have special cases hard-coded in the vectorizer, I don't see another solution to this problem. Before, you'd have assembly files with NEON specific code, another with AltiVec specific and so on, and now you'd have C files with each intrinsics, which is better. But, as you said yourself, the semantics of NEON instructions are not the same as other SIMD ISAs, so if you only have the NEON file and want to create an AltiVec version, you'll have to understand both pretty well. Stanislav, If I got it right above, I think it would be better if you could do that transformation in IR, with a mapping infrastructure between each SIMD ISA. Something that could represent every possible SIMD instruction, and how each target represents them, so in one side you read the intrinsics (and possibly IR operations on vectors), translate to this meta-SIMD language, then export on the SIMD language that you want. A tool like this, possibly exporting back to C code (so you can add it to your project as an one-off pass), would be valuable to all programs that have legacy hard-coded SSE routines to run on any platform that support SIMD operations. I have no idea how easy would be to do that, let alone if it's at all possible, but it seems that this is what you want. Correct me if I'm wrong. cheers, --renato -------------- next part -------------- An HTML attachment was scrubbed... URL: <lists.llvm.org/pipermail/llvm-dev/attachments/20131002/d4fc7475/attachment.html>
Seemingly Similar Threads
- [LLVMdev] Implementing the ARM NEON Intrinsics for PowerPC
- [LLVMdev] Implementing the ARM NEON Intrinsics for PowerPC
- [LLVMdev] Implementing the ARM NEON Intrinsics for PowerPC
- [LLVMdev] Implementing the ARM NEON Intrinsics for PowerPC
- [LLVMdev] Implementing the ARM NEON Intrinsics for PowerPC