Renato Golin
2013-Oct-02 11:17 UTC
[LLVMdev] Implementing the ARM NEON Intrinsics for PowerPC
On 2 October 2013 10:12, Steven Newbury <steve at snewbury.org.uk> wrote:> How does this make any sense? >I have to agree with you that this doesn't make much sense, but there is a case where you would want something like that: when the original source uses NEON intrinsics, and there is no alternative in AltiVec, AVX or even plain C. We encourage people to use NEON intrinsics, as opposed to writing inline NEON assembly, when the compiler cannot vectorize your code properly. This may fix the current problem of under-performing forward-incompatible inline asm, and it does solve the portability issue across ARM sub-architectures (ex. v7 vs v8), but it doesn't help on portability across entirely different architectures. Since it's not easy to vectorize every code, and not desired to have special cases hard-coded in the vectorizer, I don't see another solution to this problem. Before, you'd have assembly files with NEON specific code, another with AltiVec specific and so on, and now you'd have C files with each intrinsics, which is better. But, as you said yourself, the semantics of NEON instructions are not the same as other SIMD ISAs, so if you only have the NEON file and want to create an AltiVec version, you'll have to understand both pretty well. Stanislav, If I got it right above, I think it would be better if you could do that transformation in IR, with a mapping infrastructure between each SIMD ISA. Something that could represent every possible SIMD instruction, and how each target represents them, so in one side you read the intrinsics (and possibly IR operations on vectors), translate to this meta-SIMD language, then export on the SIMD language that you want. A tool like this, possibly exporting back to C code (so you can add it to your project as an one-off pass), would be valuable to all programs that have legacy hard-coded SSE routines to run on any platform that support SIMD operations. I have no idea how easy would be to do that, let alone if it's at all possible, but it seems that this is what you want. Correct me if I'm wrong. cheers, --renato -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20131002/d4fc7475/attachment.html>
Stanislav Manilov
2013-Oct-02 11:34 UTC
[LLVMdev] Implementing the ARM NEON Intrinsics for PowerPC
On 2 October 2013 12:17, Renato Golin <renato.golin at linaro.org> wrote:> On 2 October 2013 10:12, Steven Newbury <steve at snewbury.org.uk> wrote: > >> How does this make any sense? >> > > I have to agree with you that this doesn't make much sense, but there is a > case where you would want something like that: when the original source > uses NEON intrinsics, and there is no alternative in AltiVec, AVX or even > plain C. >This is exactly the case that I am in. I want to make DSP code written in C, but with NEON intrinsics "portable" as it is less feasible to rewrite it.> Stanislav, > > If I got it right above, I think it would be better if you could do that > transformation in IR, with a mapping infrastructure between each SIMD ISA. > Something that could represent every possible SIMD instruction, and how > each target represents them, so in one side you read the intrinsics (and > possibly IR operations on vectors), translate to this meta-SIMD language, > then export on the SIMD language that you want. > > A tool like this, possibly exporting back to C code (so you can add it to > your project as an one-off pass), would be valuable to all programs that > have legacy hard-coded SSE routines to run on any platform that support > SIMD operations. > > I have no idea how easy would be to do that, let alone if it's at all > possible, but it seems that this is what you want. Correct me if I'm wrong. >Again, the tool you describe is exactly what I ultimately want to create. The translation to AltiVec would be a step towards understanding how to manipulate the intrinsics, but it is not a goal on its own. Do you have any ideas where in the whole LLVM structure would it fit (should it be implemented as a separate optional pass)? Thanks, - Stan -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20131002/8fa7d401/attachment.html>
Renato Golin
2013-Oct-02 12:07 UTC
[LLVMdev] Implementing the ARM NEON Intrinsics for PowerPC
On 2 October 2013 12:34, Stanislav Manilov <stanislav.manilov at gmail.com>wrote:> Again, the tool you describe is exactly what I ultimately want to create. > The translation to AltiVec would be a step towards understanding how to > manipulate the intrinsics, but it is not a goal on its own. > > Do you have any ideas where in the whole LLVM structure would it fit > (should it be implemented as a separate optional pass)? >I think there are two separate things: 1. A conversion tool, that will read specific SIMD-1 C files and produce SIMD-2 C files. This will need the C back-end to be working well, or implement its own SIMD-specific C backend, which is in itself, quite a big task. This tool would have to use a function pass that would scan for SIMD-1 intrinsics, and convert them to SIMD-2 in the IR level, so your tool would read the SIMD-1 file as if it were targeting arch-2, and the pass would convert automatically, using the function pass below. 2. A function pass, to do the conversion between SIMD-1 intrinsics to SIMD-2, based on their original namespace inside LLVM (AVX, NEON, etc) and the target parameter (for SIMD-2 output). This FP should be off by default, of course, but could be turned on (say -convert-simd-intrinsics) when compiling legacy code. I'd start with just cataloguing all NEON and AltiVec intrinsics, and trying to map them. You'll probably hit cases where NEON A == AltiVec A + op1 + op2, so you'll have to take head and tail operations around the intrinsics as possible part of an interchangeable SIMD operation. As a first example, you could write a function pass to get only the ones that map nicely 1-to-1 and see if the concept works, and if people are happy with your changes. It should be able to read a (very simple) NEON C file and produce compatible PowerPC AltiVec assembly code. After the infrastructure is in place, you can continue incrementing it by adding support for more intrinsics, more SIMD ISAs, and more complex patterns (involving surrounding instructions, etc). In parallel, you could try to create the tool that would do the source-to-source transformation, using the pass that you have written. Of course, adding tests for all known supported conversions to/from would be critical to the success of your project. cheers, --renato -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20131002/354ccad1/attachment.html>
Hal Finkel
2013-Oct-02 12:36 UTC
[LLVMdev] Implementing the ARM NEON Intrinsics for PowerPC
----- Original Message -----> On 2 October 2013 12:17, Renato Golin < renato.golin at linaro.org > > wrote: > > > > > On 2 October 2013 10:12, Steven Newbury < steve at snewbury.org.uk > > wrote: > > > > > > How does this make any sense? > > > I have to agree with you that this doesn't make much sense, but there > is a case where you would want something like that: when the > original source uses NEON intrinsics, and there is no alternative in > AltiVec, AVX or even plain C. > > > This is exactly the case that I am in. I want to make DSP code > written in C, but with NEON intrinsics "portable" as it is less > feasible to rewrite it.Are you using Clang as the frontend? If so, my recommendation would be to start by creating a header file that implements the NEON intrinsics in terms of generic functionality and the Altivec ones. The header file would need to look kind of like this: #if defined(__powerpc__) || defined(__ppc__) #define neon_intrinsic1 ppc_neon_intrinsic1 static __inline__ vec_type __attribute__((__always_inline__, __nodebug__)) ppc_neon_intrinsic1(vec_type a1, vec_type a2) { ... } ... #endif If you look in tools/clang/lib/Headers you'll see lots of example intrinsics header files, and if you look in your build directory in tools/clang/lib/Headers you'll find the arm_neon.h.inc file. You can certainly do this in terms of an LLVM transformation, but I think that creating some kind of header file would be, at least, where I'd start prototyping this. Also, you'll want to make sure that the endianness of the ARM and PPC environments agree (or that the code is endian-neutral), otherwise you'll likely have bigger problems ;) -Hal> > > > > Stanislav, > > > If I got it right above, I think it would be better if you could do > that transformation in IR, with a mapping infrastructure between > each SIMD ISA. Something that could represent every possible SIMD > instruction, and how each target represents them, so in one side you > read the intrinsics (and possibly IR operations on vectors), > translate to this meta-SIMD language, then export on the SIMD > language that you want. > > > A tool like this, possibly exporting back to C code (so you can add > it to your project as an one-off pass), would be valuable to all > programs that have legacy hard-coded SSE routines to run on any > platform that support SIMD operations. > > > I have no idea how easy would be to do that, let alone if it's at all > possible, but it seems that this is what you want. Correct me if I'm > wrong. > > > Again, the tool you describe is exactly what I ultimately want to > create. The translation to AltiVec would be a step towards > understanding how to manipulate the intrinsics, but it is not a goal > on its own. > > > > Do you have any ideas where in the whole LLVM structure would it fit > (should it be implemented as a separate optional pass)? > > > Thanks, > - Stan > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >-- Hal Finkel Assistant Computational Scientist Leadership Computing Facility Argonne National Laboratory
Possibly Parallel Threads
- [LLVMdev] Implementing the ARM NEON Intrinsics for PowerPC
- [LLVMdev] Implementing the ARM NEON Intrinsics for PowerPC
- [LLVMdev] Implementing the ARM NEON Intrinsics for PowerPC
- [LLVMdev] Implementing the ARM NEON Intrinsics for PowerPC
- [LLVMdev] Implementing the ARM NEON Intrinsics for PowerPC