similar to: [LLVMdev] RFC: new intrinsic llvm.memcmp?

Displaying 20 results from an estimated 30000 matches similar to: "[LLVMdev] RFC: new intrinsic llvm.memcmp?"

2010 Aug 20
0
[LLVMdev] RFC: new intrinsic llvm.memcmp?
On Fri, Aug 20, 2010 at 1:03 PM, Bagel <bagel99 at gmail.com> wrote: > I propose a new intrinsic "llvm.memcmp" that compares a block of memory > for equality (a subset of the libc behavior).  Backends are free to use the > alignment to optimize using wider than byte operations.  Since the result is > only equal/not-equal, byte order is not important. > > For
2010 Aug 20
1
[LLVMdev] RFC: new intrinsic llvm.memcmp?
On 08/20/2010 04:06 PM, Eli Friedman wrote: > On Fri, Aug 20, 2010 at 1:03 PM, Bagel<bagel99 at gmail.com> wrote: >> I propose a new intrinsic "llvm.memcmp" that compares a block of memory >> for equality (a subset of the libc behavior). Backends are free to use the >> alignment to optimize using wider than byte operations. Since the result is >> only
2009 Jul 24
4
[LLVMdev] llvm-as regression
The following causes an assertion in recent svn pulls, but not in 2.5. The assertion: llvm-as: /home/bgl/work/llvm-work/include/llvm/ADT/SmallVector.h:125: T& llvm::SmallVectorImpl<T>::operator[](unsigned int) [with T = llvm::Constant*]: Assertion `Begin + idx < End' failed. The .ll code: target datalayout =
2014 Sep 11
2
[LLVMdev] Is shortening a load a bug?
When the IR specifies a 32 bit load can it be changed to a narrower load? What if the load is from memory (e.g. a peripheral) that only supports 32-bit access? Consider the following IR: ---- target datalayout = "e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:32" target triple = "thumbv7m-unknown-unknown" @f = external global i32 define zeroext i8 @bar() nounwind { L.0:
2017 May 19
4
memcmp code fragment
Hi, Look at the following code: Look at the following C code seqence: unsigned char mainGtU ( unsigned int i1, unsigned int i2, unsigned char* block) { unsigned char c1, c2; c1 = block[i1]; c2 = block[i2]; if (c1 != c2) return (c1 > c2); i1++; i2++; c1 = block[i1]; c2 = block[i2]; if (c1 != c2) return (c1 > c2); i1++; i2++; .. ..
2016 Dec 30
2
RFC: Inline expansion of memcmp vs call to standard library
Can I make another suggestion: create an intrinsic for memory equality, e.g. llvm.memcmp_eq.p0i8.p0i8.i64(i8*a, i8*b, i64 len). This intrinsic would return zero if the memory regions are equal, and nonzero otherwise. However, it does NOT return any notion of "greater" or "less". Many applications require only determining equality, rather than a total ordering. Given that
2016 Dec 30
0
RFC: Inline expansion of memcmp vs call to standard library
With the intrinsic support for ‘memcpy’ and ‘memset’ the operands also have associated alignment operands. I think that ‘memcmp’ should also provide the alignment information for each of the source operands (when statically known). In some cases this will lead to more optimal alignment aware lowering, and for targets for which unaligned access is costly or fatal, it can be lowered safely.
2016 Dec 29
0
RFC: Inline expansion of memcmp vs call to standard library
Improving lowering for memcmp is definitely something we should do for all targets. Doing it in a target specific way is decidedly non-ideal. It looks like we already have some code in SelectionDAGBuilder which tries to optimize the lowering for the memcpy library call. I am a bit confused by the problem you are trying to solve. Are you specifically interested in lowering for constant
2014 Sep 12
2
[LLVMdev] Is shortening a load a bug?
On 09/11/2014 05:33 PM, Quentin Colombet wrote: > Hi Brian, > > On Sep 11, 2014, at 3:03 PM, Bagel <bagel99 at gmail.com> wrote: > >> When the IR specifies a 32 bit load can it be changed to a narrower >> load? What if the load is from memory (e.g. a peripheral) that only >> supports 32-bit access? Consider the following IR: ---- target datalayout >> =
2014 Aug 07
3
[LLVMdev] MCJIT generates MOVAPS on unaligned address
MCJIT when lowering to x86-64 generates a MOVAPS (Move Aligned Packed Single-Precision Floating-Point Values) on a non-aligned memory address: movaps 88(%rdx), %xmm0 where %rdx comes in as a function argument with only natural alignment (float*). This x86 instruction requires the memory address to be 16 byte aligned which 88 plus something aligned to 4 byte isn't. Here the
2017 Feb 17
2
multiprecision add/sub
On 02/16/2017 12:08 PM, Stephen Canon wrote: >> On Feb 16, 2017, at 9:12 AM, Bagel <bagel99 at gmail.com >> <mailto:bagel99 at gmail.com>> wrote: >> >> I figured that the optimization of this would bedifficult (else it would >> have already been done :-)) > > Don’t make this assumption. There’s lots of opportunities for optimization > scattered
2008 Apr 13
1
[LLVMdev] Is there a reason why memcmp isn't an intrinsic?
Chris Lattner wrote: > On Apr 13, 2008, at 12:40 PM, Talin wrote: > > >> Since you have memcpy, memmove, and memset in there, I was wondering >> why >> memcmp wasn't there as well. It seems obvious - which makes me think >> that if it's not there, then there must be some reason for it. >> > > Why do you want it to be an intrinsic?
2010 Jun 08
2
[LLVMdev] possible regression regarding bitcasts?
The following code works on 2.7, but causes an assertion on recent snapshots. Has something changed regarding bitcasts that makes this illegal? The code is: target datalayout = "e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:32" target triple = "x86-unknown-unknown" @aa = global [32 x i8] zeroinitializer, align 1 @bb = global [16 x i8] zeroinitializer, align 1 define
2015 Jul 27
3
[LLVMdev] i1* function argument on x86-64
I am running into a problem with 'i1*' as a function's argument which seems to have appeared since I switched to LLVM 3.6 (but can have other source, of course). If I look at the assembler that the MCJIT generates for an x86-64 target I see that the array 'i1*' is taken as a sequence of 1 bit wide elements. (I guess that's correct). However, I used to call the function
2017 Mar 07
2
multiprecision add/sub
> On Feb 21, 2017, at 9:54 PM, Nemanja Ivanovic via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > I believe that providing additional intrinsics that would directly produce the ISD::ADDC/ISD::SUBC nodes would provide the additional advantage of being able to directly produce these nodes for code that doesn't have anything to do with multiprecision addition/subtraction. I am
2008 Apr 13
2
[LLVMdev] Is there a reason why memcmp isn't an intrinsic?
Since you have memcpy, memmove, and memset in there, I was wondering why memcmp wasn't there as well. It seems obvious - which makes me think that if it's not there, then there must be some reason for it. -- Talin
2013 Oct 28
2
[LLVMdev] loop vectorizer says Bad stride
Verifying function running passes ... LV: Checking a loop in "bar" LV: Found a loop: L0 LV: Found an induction variable. LV: We need to do 0 pointer comparisons. LV: Checking memory dependencies LV: Bad stride - Not an AddRecExpr pointer %13 = getelementptr float* %arg2, i32 %1 SCEV: ((4 * (sext i32 {(256 + %arg0),+,1}<nw><%L0> to i64)) + %arg2) LV: Src Scev: {((4 * (sext
2010 Nov 23
3
[LLVMdev] question on the status of debugging symbols
Would someone be so kind as to tell me what the status of debugging symbols (DWARF) generated by clang/llvm is? I am on a linux x86-64 system (Fedora 13). Is gdb supposed to understand the generated DWARF? When I generate an executable with "clang -g" followed by "llc -O0" and feed it to gdb, I get "no debugging symbols found". What is the status of lldb on
2008 Apr 13
0
[LLVMdev] Is there a reason why memcmp isn't an intrinsic?
On Apr 13, 2008, at 12:40 PM, Talin wrote: > Since you have memcpy, memmove, and memset in there, I was wondering > why > memcmp wasn't there as well. It seems obvious - which makes me think > that if it's not there, then there must be some reason for it. Why do you want it to be an intrinsic? What does that provide? -Chris
2017 Feb 16
2
multiprecision add/sub
It takes two "llvm.uadd.with.overflow" instances to model the add-with-carry when there is a carry-in. Look at the IR generated by the example. I figured that the optimization of this would bedifficult (else it would have already been done :-)). And would this optimization have to be done for every architecture? On 02/15/2017 04:28 PM, Stephen Canon wrote: > > Why do you think