similar to: [LLVMdev] Help with promotion/custom handling of MUL i32 and MUL i64

Displaying 20 results from an estimated 800 matches similar to: "[LLVMdev] Help with promotion/custom handling of MUL i32 and MUL i64"

2013 Jul 30
0
[LLVMdev] Help with promotion/custom handling of MUL i32 and MUL i64
On Tue, Jul 30, 2013 at 01:14:16PM -0600, Dan wrote: > I'll try to run through the scenario: > > > 64-bit register type target (all registers have 64 bits). > > all 32-bits are getting promoted to 64-bit integers > > Problem: > > MUL on i32 is getting promoted to MUL on i64 > > MUL on i64 is getting expanded to a library call in compiler-rt > >
2013 Jul 31
2
[LLVMdev] Help with promotion/custom handling of MUL i32 and MUL i64
Thanks for the information, allow maybe I can re-phrase the question or issue. Assume 64-bit register types, but integer is 32-bit. Already have table generation of the 64-bit operation descriptions. How about this modified approach? Before type-legalization, I'd really like to move all MUL I64 to a subroutine call of my own choice. This would be a form of customization, but I want this
2013 Jul 31
0
[LLVMdev] Help with promotion/custom handling of MUL i32 and MUL i64
Hi Dan, If you set the node's action to "Custom", you should be able to interfere in the type legalisation phase (before it gets promoted to a 64-bit MUL) by overriding the "ReplaceNodeResults" function. You could either expand it to a different libcall directly there, or replace it with a target-specific node (say XXXISD::MUL32) which claims to take i64 types but you
2013 Jul 31
1
[LLVMdev] Help with promotion/custom handling of MUL i32 and MUL i64
Thanks Tom. I really appreciate your insight. I'm able to use the customize to get the 64-bit to go to a subroutine and for the 32-bit, I am generate XXXISD::MUL32. I'm not sure then what you mean about "overriding" the ReplaceNodeResults. For ReplaceNodeResults, I'm doing: SDValue Res = LowerOperation(SDValue(N, 0), DAG); for (unsigned I = 0, E =
2009 May 21
0
[LLVMdev] [PATCH] Add new phase to legalization to handle vector operations
On Wed, May 20, 2009 at 4:55 PM, Dan Gohman <gohman at apple.com> wrote: > Can you explain why you chose the approach of using a new pass? > I pictured removing LegalizeDAG's type legalization code would > mostly consist of finding all the places that use TLI.getTypeAction > and just deleting code for handling its Expand and Promote. Are you > anticipating something more
2009 May 20
2
[LLVMdev] [PATCH] Add new phase to legalization to handle vector operations
On May 20, 2009, at 1:34 PM, Eli Friedman wrote: > On Wed, May 20, 2009 at 1:19 PM, Eli Friedman > <eli.friedman at gmail.com> wrote: > >> Per subject, this patch adding an additional pass to handle vector >> >> operations; the idea is that this allows removing the code from >> >> LegalizeDAG that handles illegal types, which should be a significant
2012 Nov 16
2
[LLVMdev] Operand order in dag pattern matching in td files
Hi, I have a simple question w.r.t the order of operands used in dag pattern matching in target files. Some of them seem intuitive. But I want to get it clarified anyway. I am using a pattern from X86InstrFMA.td in the below example. Consider FMA3 pattern (simplified). let Constraints = "$src1 = $dst" in { multiclass fma3s_rm<bits<8> opc, string OpcodeStr, X86MemOperand
2012 Nov 16
0
[LLVMdev] Operand order in dag pattern matching in td files
On 16 November 2012 13:41, Anitha B Gollamudi <anitha.boyapati at gmail.com> wrote: > Hi, > > I have a simple question w.r.t the order of operands used in dag > pattern matching in target files. Some of them seem intuitive. But I > want to get it clarified anyway. I am using a pattern from > X86InstrFMA.td in the below example. Consider FMA3 pattern > (simplified). >
2017 Oct 05
3
Bug 20871 -- is there a fix or work around?
Looks like I have run into the same issue reported in: https://bugs.llvm.org/show_bug.cgi?id=20871 Is there a fix or work-around for it? The bug report seems to be still open. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20171005/46c1282d/attachment.html>
2012 Nov 16
1
[LLVMdev] Operand order in dag pattern matching in td files
You've unfortunately chosen a complex example. Your second question is needs be answered first. null_frag causes the pattern to be dropped. Now having covered that the reason the operands are in the order they are is because the only instruction that doesn't use null_frag is this one defm r213 : fma3s_rm<opc213, !strconcat(OpStr, !strconcat("213", PackTy)),
2010 Mar 17
2
[LLVMdev] llvm-gcc promotes i32 mul to i64 inside __muldi3
I'm building tool-chain for processor without integer MUL. So, I've defined __mulsi3 for integer multiplication (int32). Now I've got a problem with int64 multiplication which is implemented in libgcc2.c. Segfualt due to infinite recursion in i64 soft multiplication (libgcc2, __muldi3). LLVM-GCC (for my target) misoptimizes code if -O2 is passed. It promotes i32 multiplication to
2010 Mar 17
0
[LLVMdev] llvm-gcc promotes i32 mul to i64 inside __muldi3
On Wed, Mar 17, 2010 at 1:32 PM, Sergey Yakoushkin <sergey.yakoushkin at gmail.com> wrote: > I'm building tool-chain for processor without integer MUL. > So, I've defined __mulsi3 for integer multiplication (int32). > > Now I've got a problem with int64 multiplication which is implemented > in libgcc2.c. > Segfualt due to infinite recursion in i64 soft
2010 Mar 17
2
[LLVMdev] llvm-gcc promotes i32 mul to i64 inside __muldi3
On Wed, Mar 17, 2010 at 3:54 PM, Eli Friedman <eli.friedman at gmail.com> wrote: > On Wed, Mar 17, 2010 at 1:32 PM, Sergey Yakoushkin > <sergey.yakoushkin at gmail.com> wrote: >> I'm building tool-chain for processor without integer MUL. >> So, I've defined __mulsi3 for integer multiplication (int32). >> >> Now I've got a problem with int64
2010 Mar 17
0
[LLVMdev] llvm-gcc promotes i32 mul to i64 inside __muldi3
Thanks, yes, I'm facing the same issue. Hm... seems there are no simple fixes. I have to do one more i64 mul implementation to workaround aggressive optimizations. Is that correct? Is this the only way? Can I disable only one particular pass which does this promotion from i32 to i64 using some LLVM-GCC option? Are there other libgcc functions affected by this optimization? Regards, Sergey
2010 Mar 17
0
[LLVMdev] llvm-gcc promotes i32 mul to i64 inside __muldi3
> This shouldn't be necessary, IMO. If you were going to implement it, > then the correct thing to do would be to have generic selection dag > lowering of large multiplies, which renders the library mostly > useless. In fact, I would prefer to avoid custom lowering for operations on large types. i64 will be rare in my case (embedded) and their performance is not an issue. I need
2010 Mar 17
2
[LLVMdev] llvm-gcc promotes i32 mul to i64 inside __muldi3
On Wed, Mar 17, 2010 at 4:57 PM, Sergey Yakoushkin <sergey.yakoushkin at gmail.com> wrote: > Thanks, yes, I'm facing the same issue. > > Hm... seems there are no simple fixes. > I have to do one more i64 mul implementation to workaround aggressive > optimizations. > Is that correct? Is this the only way? This shouldn't be necessary, IMO. If you were going to
2020 May 19
5
[PATCHv2] SSE2/SSSE3 optimized version of get_checksum1() for x86-64
I've read up some more on the subject, and it seems the proper way to do this with GCC is g++ and target attributes. I've refactored the patch that way, and it indeed uses SSSE3 automatically on supporting CPUs, regardless of the build host, so this should be ideal both for home builders and distros. Getting the code to build right in c++ mode (checksum_sse2.cpp only) was a bit of an
2020 May 18
6
[PATCH] SSE2/SSSE3 optimized version of get_checksum1() for x86-64
This drop-in patch increases the performance of the get_checksum1() function on x86-64. On the target slow CPU performance of the function increased by nearly 50% in the x86-64 default SSE2 mode, and by nearly 100% if the compiler was told to enable SSSE3 support. The increase was over 200% on the fastest CPU tested in SSSE3 mode. Transfer time improvement with large files existing on both ends
2020 May 18
3
[PATCH] SSE2/SSSE3 optimized version of get_checksum1() for x86-64
What do you base this on? Per https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html : "For the x86-32 compiler, you must use -march=cpu-type, -msse or -msse2 switches to enable SSE extensions and make this option effective. For the x86-64 compiler, these extensions are enabled by default." That reads to me like we're fine for SSE2. As stated in my comments, SSSE3 support must be
2017 Oct 07
2
Bug 20871 -- is there a fix or work around?
Ignore the suggested fix in my earlier post. How about this? diff --git a/lib/Target/X86/X86ISelLowering.cpp b/lib/Target/X86/X86ISelLowering.cpp index 20c81c3..b8ebf42 100644 --- a/lib/Target/X86/X86ISelLowering.cpp +++ b/lib/Target/X86/X86ISelLowering.cpp @@ -1632,10 +1632,11 @@ X86TargetLowering::X86TargetLowering(const X86TargetMachine &TM, if (!Subtarget.is64Bit()) { // These