similar to: GVN / Alias Analysis issue with llvm.masked.scatter/gather intrinsics

Displaying 20 results from an estimated 10000 matches similar to: "GVN / Alias Analysis issue with llvm.masked.scatter/gather intrinsics"

2016 Aug 29
2
GVN / Alias Analysis issue with llvm.masked.scatter/gather intrinsics
this is definitely a bug in AA. 225 for (auto I = CS2.arg_begin(), E = CS2.arg_end(); I != E; ++I) { 226 const Value *Arg = *I; 227 if (!Arg->getType()->isPointerTy()) -> 228 continue; 229 unsigned CS2ArgIdx = std::distance(CS2.arg_begin(), I); 230 auto CS2ArgLoc = MemoryLocation::getForArgument(CS2, CS2ArgIdx, TLI);
2016 Aug 29
2
GVN / Alias Analysis issue with llvm.masked.scatter/gather intrinsics
Okay, so then it sounds like, for now, the right fix is to stop marking masked.gather and masked.scatter with intrarg* options. On Mon, Aug 29, 2016, 1:26 PM Philip Reames <listmail at philipreames.com> wrote: > We might have specification bug here, but we appear to implement what we > specified. argmemonly is specified as only considering pointer typed > arguments. It's
2016 Aug 29
2
GVN / Alias Analysis issue with llvm.masked.scatter/gather intrinsics
+ a few others. After following this rabbit hole a bit, there are a lot of mutually recursive calls, etc, that may or may not do the right thing with vectors of pointers. I can fix *this* particular bug with the attached patch. However, it's mostly papering over stuff. Nothing seems to know what to do with a memorylocation that is a vector of pointers. They all expect memorylocation to be a
2016 Aug 30
2
GVN / Alias Analysis issue with llvm.masked.scatter/gather intrinsics
----- Original Message ----- > From: "Daniel Berlin" <dberlin at dberlin.org> > To: "Philip Reames" <listmail at philipreames.com>, "Davide Italiano" > <davide at freebsd.org>, "Chandler Carruth" <chandlerc at gmail.com> > Cc: "Chris Sakalis" <chrissakalis at gmail.com>, "David Majnemer" >
2016 Aug 31
2
GVN / Alias Analysis issue with llvm.masked.scatter/gather intrinsics
Thank you for the quick fix, I can no longer reproduce the issue. As far a releases go, I am guessing that this is going to be in 4.0? Best, Chris On Tue, Aug 30, 2016 at 9:26 PM, Daniel Berlin <dberlin at dberlin.org> wrote: > Yeah, i just hope it doesn't regress scatter/gather vector code badly. > But at least it's correct now? > > > On Tue, Aug 30, 2016 at 1:11
2016 Aug 31
2
GVN / Alias Analysis issue with llvm.masked.scatter/gather intrinsics
Great, thank you! On Wed, Aug 31, 2016 at 2:07 PM, Hal Finkel <hfinkel at anl.gov> wrote: > > ------------------------------ > > *From: *"Chris Sakalis" <chrissakalis at gmail.com> > *To: *"Daniel Berlin" <dberlin at dberlin.org> > *Cc: *"Hal Finkel" <hfinkel at anl.gov>, "David Majnemer" < > david.majnemer
2017 Feb 06
2
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
Hi Jean-Marc, Thanks a lot for reviewing this huge assembly function! silk_warped_autocorrelation_FIX_c()'s kernel part is for( n = 0; n < length; n++ ) { tmp1_QS = silk_LSHIFT32( (opus_int32)input[ n ], QS ); /* Loop over allpass sections */ for( i = 0; i < order; i++ ) { /* Output of allpass section */ tmp2_QS = silk_SMLAWB(
2017 Feb 07
2
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
This is a great idea. But the order (psEncC->shapingLPCOrder) can be configured to 12, 14, 16, 20 and 24 according to complexity parameter. It's hard to get a universal function to handle all these orders efficiently. Any suggestions? Thanks, Linfeng On Mon, Feb 6, 2017 at 12:40 PM, Jean-Marc Valin <jmvalin at jmvalin.ca> wrote: > Hi Linfeng, > > On 06/02/17 02:51 PM,
2017 Feb 07
3
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
Hi Jean-Marc, Thanks for your suggestions. Will get back to you once we have some updates. Linfeng On Mon, Feb 6, 2017 at 5:47 PM, Jean-Marc Valin <jmvalin at jmvalin.ca> wrote: > Hi Linfeng, > > On 06/02/17 07:18 PM, Linfeng Zhang wrote: > > This is a great idea. But the order (psEncC->shapingLPCOrder) can be > > configured to 12, 14, 16, 20 and 24 according to
2017 Apr 05
2
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
I attached a new patch with small cleanup (disassembly is identical as the last patch). We have done the same internal testing as usual. Also, attached 2 failed temporary versions which try to reduce code size (just for code review reference purpose). The new patch of silk_warped_autocorrelation_FIX_neon() has a code size of 3,228 bytes (with gcc). smaller_slower.c has a code size of 2,304
2017 Apr 05
4
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
Thank Jean-Marc! The speedup percentages are all relative to the entire encoder. Comparing to master, this optimization patch speeds up fixed-point SILK encoder on NEON as following: Complexity 5: 6.1% Complexity 6: 5.8% Complexity 8: 5.5% Complexity 10: 4.0% when testing on an Acer Chromebook, ARMv7 Processor rev 3 (v7l), CPU max MHz: 2116.5 Thanks, Linfeng On Wed, Apr 5, 2017 at 11:02 AM,
2013 Feb 14
2
[LLVMdev] Question about fastcc assumptions and seemingly superfluous %esp updates
Hello, While investigating one of the existing tests (test/CodeGen/X86/tailcallpic2.ll), I ran into IR that produces some interesting code. The IR is very straightforward: define protected fastcc i32 @tailcallee(i32 %a1, i32 %a2, i32 %a3, i32 %a4) { entry: ret i32 %a3 } define fastcc i32 @tailcaller(i32 %in1, i32 %in2) { entry: %tmp11 = tail call fastcc i32 @tailcallee( i32 %in1, i32 %in2, i32
2013 Feb 15
0
[LLVMdev] Question about fastcc assumptions and seemingly superfluous %esp updates
Hey Eli, On Thu, Feb 14, 2013 at 5:45 PM, Eli Bendersky <eliben at google.com> wrote: > Hello, > > While investigating one of the existing tests > (test/CodeGen/X86/tailcallpic2.ll), I ran into IR that produces some > interesting code. The IR is very straightforward: > > define protected fastcc i32 @tailcallee(i32 %a1, i32 %a2, i32 %a3, i32 > %a4) { > entry: >
2015 Dec 01
11
[PATCH 1/6] x86: Add VMWare Host Communication Macros
These macros will be used by multiple VMWare modules for handling host communication. v2: * Keeping only the minimal common platform defines * added vmware_platform() check function v3: * Added new field to handle different hypervisor magic values Signed-off-by: Sinclair Yeh <syeh at vmware.com> Reviewed-by: Thomas Hellstrom <thellstrom at vmware.com> Reviewed-by: Alok N Kataria
2015 Dec 01
11
[PATCH 1/6] x86: Add VMWare Host Communication Macros
These macros will be used by multiple VMWare modules for handling host communication. v2: * Keeping only the minimal common platform defines * added vmware_platform() check function v3: * Added new field to handle different hypervisor magic values Signed-off-by: Sinclair Yeh <syeh at vmware.com> Reviewed-by: Thomas Hellstrom <thellstrom at vmware.com> Reviewed-by: Alok N Kataria
2013 Feb 15
2
[LLVMdev] Question about fastcc assumptions and seemingly superfluous %esp updates
>> While investigating one of the existing tests >> (test/CodeGen/X86/tailcallpic2.ll), I ran into IR that produces some >> interesting code. The IR is very straightforward: >> >> define protected fastcc i32 @tailcallee(i32 %a1, i32 %a2, i32 %a3, i32 >> %a4) { >> entry: >> ret i32 %a3 >> } >> >> define fastcc i32 @tailcaller(i32
2008 Dec 30
2
[LLVMdev] Folding vector instructions
Hello. Sorry I am not sure this question should go to llvm or mesa3d-dev mailing list, so I post it to both. I am writing a llvm backend for a modern graphics processor which has a ISA very similar to that of Direct 3D. I am reading the code in Gallium-3D driver in a mesa3d branch, which converts the shader programs (TGSI tokens) to LLVM IR. For the shader instruction also found in LLVM IR,
2013 Feb 15
0
[LLVMdev] Question about fastcc assumptions and seemingly superfluous %esp updates
When you enable -tailcallopt you get support for tail calls between functions with arbitrary stack space requirements. That means the calling convention has to change slightly. E.g the callee is responsible for removing it's arguments of the stack. The caller cannot transitively know the tail callee's tailcallee's requirement. Also care must be taken to make sure the stack stays
2013 Feb 15
1
[LLVMdev] Question about fastcc assumptions and seemingly superfluous %esp updates
Hi Arnold, Thanks for the insights. My comments below: On Thu, Feb 14, 2013 at 5:30 PM, Arnold Schwaighofer <aschwaighofer at apple.com> wrote: > When you enable -tailcallopt you get support for tail calls between functions with arbitrary stack space requirements. That means the calling convention has to change slightly. E.g the callee is responsible for removing it's arguments of
2010 Jan 13
4
merging issue.........
hi, I have a question about merging two files. For example, I have two files, the first file is like the following: id trait1 1 10.2 2 11.1 3 9.7 6 10.2 7 8.9 10 9.7 11 10.2 The second file is like the following: id trait2 1 9.8 2 10.8 4 7.8 5 9.8 6 10.1 12 10.2 13 10.1 now I want to merge the two files by the variable "id", I only want