similar to: Signed Division and InstCombine

Displaying 20 results from an estimated 1000 matches similar to: "Signed Division and InstCombine"

2016 May 31
1
Signed Division and InstCombine
On 31 May 2016 at 16:02, Dilan Manatunga <manatunga at gmail.com> wrote: > Just to verify, a 16-bit divion of INT16_MIN by -1 results in INT16_MIN > again? No, "sdiv i16 -32768, -1" is undefined behaviour. The version with an "sext" and "trunc" avoids the undefined behaviour and does return -32768. > If the issue only occurs in this case, why
2016 May 31
2
Signed Division and InstCombine
On 31 May 2016 at 15:42, Tim Northover <t.p.northover at gmail.com> wrote: > A 16-bit division of INT16_MIN by -1 is undefined behaviour but the > original ext/trunc version is well-defined as 0. Sorry, INT16_MIN again actually. The main point still stands though, I think. Tim.
2016 May 31
0
Signed Division and InstCombine
Just to verify, a 16-bit divion of INT16_MIN by -1 results in INT16_MIN again? If the issue only occurs in this case, why aren't there checks to see if we can simplify sdiv in cases where we know that numerator is not INT16_MIN or the denominator is not -1. For example, we could simplify divides involving one operand constants. Is it because this case is most likely rare? -Dilan On Tue,
2017 Jul 21
4
Issue with DAG legalization of brcond, setcc, xor
But isn't kinda silly that we transform to xor and then we transform it back. What is the advantage in doing so? Also, since we do that method, I now have to introduce setcc patterns for i1 values, instead of being able to just use logical pattern operators like not. -Dilan On Fri, Jul 21, 2017 at 11:00 AM Dilan Manatunga <manatunga at gmail.com> wrote: > For some reason I
2017 Jul 31
2
X86 Backend SelectionDAG - Source Scheduling
Thanks that clears things up. So if I want to mess around with how schedules are generated, looking at the MachineScheduler pass is the best place now? -Dilan On Mon, Jul 31, 2017 at 3:24 PM Matthias Braun <mbraun at apple.com> wrote: > > > On Jul 31, 2017, at 2:51 PM, Dilan Manatunga via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > > > > Hi, > >
2017 Jul 20
3
Issue with DAG legalization of brcond, setcc, xor
Hi, I am having some issues with how some of the instructions are being legalized. So this is my intial basic block. The area of concern is the last three instructions. I will pick and choose debug output to keep this small. SelectionDAG has 36 nodes: t0: ch = EntryToken t6: i32,ch = CopyFromReg t0, Register:i32 %vreg507 t2: i32,ch = CopyFromReg t0, Register:i32 %vreg17
2016 Jun 02
4
Lowering For Loops to use architecture "loop" instruction
Hi, I'm working on project which involves writing a backend for a hypothetical architecture. I am currently trying to figure out the best way to translate for loops to use a specialized "loop" instruction the architecture supports. The instruction is similar X86's loop instruction, where a register is automatically decremented and the condition is automatically checked to see if
2017 Jul 07
2
Lowering Select to Two Predicated Movs
Ohh, that makes sense. And is the reason the first instruction doesn't get deleted because the ExpandPseudoInstructions pass occurs after Register Allocation and machine dead code elimination? -Dilan On Fri, Jul 7, 2017 at 12:37 PM Friedman, Eli <efriedma at codeaurora.org> wrote: > On 7/7/2017 12:10 PM, Dilan Manatunga wrote: > > My bad for not looking further. I'm still
2017 Jul 07
2
Lowering Select to Two Predicated Movs
My bad for not looking further. I'm still somewhat confused though. MOVCCr gets expanded in the ARMExpandPseudoInsts pass, and it still seems only a case of one instruction replacing the other. My worry of emitting two instructions, is that a dead code pass will eliminate the first instruction cause it thinks the second instruction is defining the same register. -Dilan On Fri, Jul 7, 2017
2017 Jul 07
2
Lowering Select to Two Predicated Movs
Hi, I was wondering what would be the best way to lower a select operation two predicated movs. I looked through the ARM, MIPS, and NVPTX backends and they all seem to lower a select to some sort of conditional move or native select operation. Ex. select t3, cond, t2, t1 Becomes cond mov t3, t2 !cond mov t3, t1 -Dilan -------------- next part -------------- An HTML attachment was scrubbed...
2016 May 31
0
Signed Division and InstCombine
Hi Dilan, On 31 May 2016 at 15:34, Dilan Manatunga via llvm-dev <llvm-dev at lists.llvm.org> wrote: > What is the reason for the exclusion of sdiv from the operations considered > valid for execution in a truncated format. A 16-bit division of INT16_MIN by -1 is undefined behaviour but the original ext/trunc version is well-defined as 0. Cheers. Tim.
2017 Jul 31
2
X86 Backend SelectionDAG - Source Scheduling
Hi, I was looking into how SelectionDAG scheduling is done in LLVM for different backends, and I noticed that for the X86 backend, even though it sets scheduling preferences of ILP or RegisterPressure depending on architecture, in the end, it ends up using source scheduling. I realized this is because it overrides enableMachineScheduler to return true. Is there any specific reasons why it was
2017 Jul 09
2
Loop branching inefficiencies in Backend output
Hi, I am working on a custom backend, and I am trying to figure out how to deal with some branching inefficiencies in my output code, and the best way to fix it. So, let's say I am compiling a small function that takes the sum of an array. int loop(int* array, int n) { int ret = 0; for (int i = 0; i < n; i++) { ret += array[i]; } return ret; } The problem I am having is that
2017 Feb 06
2
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
Hi Jean-Marc, Thanks a lot for reviewing this huge assembly function! silk_warped_autocorrelation_FIX_c()'s kernel part is for( n = 0; n < length; n++ ) { tmp1_QS = silk_LSHIFT32( (opus_int32)input[ n ], QS ); /* Loop over allpass sections */ for( i = 0; i < order; i++ ) { /* Output of allpass section */ tmp2_QS = silk_SMLAWB(
2017 Feb 07
2
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
This is a great idea. But the order (psEncC->shapingLPCOrder) can be configured to 12, 14, 16, 20 and 24 according to complexity parameter. It's hard to get a universal function to handle all these orders efficiently. Any suggestions? Thanks, Linfeng On Mon, Feb 6, 2017 at 12:40 PM, Jean-Marc Valin <jmvalin at jmvalin.ca> wrote: > Hi Linfeng, > > On 06/02/17 02:51 PM,
2017 Feb 07
3
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
Hi Jean-Marc, Thanks for your suggestions. Will get back to you once we have some updates. Linfeng On Mon, Feb 6, 2017 at 5:47 PM, Jean-Marc Valin <jmvalin at jmvalin.ca> wrote: > Hi Linfeng, > > On 06/02/17 07:18 PM, Linfeng Zhang wrote: > > This is a great idea. But the order (psEncC->shapingLPCOrder) can be > > configured to 12, 14, 16, 20 and 24 according to
2017 Apr 05
2
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
I attached a new patch with small cleanup (disassembly is identical as the last patch). We have done the same internal testing as usual. Also, attached 2 failed temporary versions which try to reduce code size (just for code review reference purpose). The new patch of silk_warped_autocorrelation_FIX_neon() has a code size of 3,228 bytes (with gcc). smaller_slower.c has a code size of 2,304
2016 Aug 29
2
GVN / Alias Analysis issue with llvm.masked.scatter/gather intrinsics
Hello everyone, I think I have found an gvn / alias analysis related bug, but before opening an issue on the tracker I wanted to see if I am missing something. I have the following testcase: define spir_kernel void @test(<2 x i32*> %in1, <2 x i32*> %in2, i32* %out) { > entry: > ; Just some temporary storage > %tmp.0 = alloca i32 > %tmp.1 = alloca i32 > %tmp.i =
2017 Apr 05
4
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
Thank Jean-Marc! The speedup percentages are all relative to the entire encoder. Comparing to master, this optimization patch speeds up fixed-point SILK encoder on NEON as following: Complexity 5: 6.1% Complexity 6: 5.8% Complexity 8: 5.5% Complexity 10: 4.0% when testing on an Acer Chromebook, ARMv7 Processor rev 3 (v7l), CPU max MHz: 2116.5 Thanks, Linfeng On Wed, Apr 5, 2017 at 11:02 AM,
2016 Aug 29
2
GVN / Alias Analysis issue with llvm.masked.scatter/gather intrinsics
this is definitely a bug in AA. 225 for (auto I = CS2.arg_begin(), E = CS2.arg_end(); I != E; ++I) { 226 const Value *Arg = *I; 227 if (!Arg->getType()->isPointerTy()) -> 228 continue; 229 unsigned CS2ArgIdx = std::distance(CS2.arg_begin(), I); 230 auto CS2ArgLoc = MemoryLocation::getForArgument(CS2, CS2ArgIdx, TLI);