thr3ads.net - similar to: "[LLVMdev] Using intrinsics with memory operands"

Displaying 20 results from an estimated 2000 matches similar to: "[LLVMdev] Using intrinsics with memory operands"

[LLVMdev] Using intrinsics with memory operands

2008 Aug 01

[LLVMdev] Using intrinsics with memory operands

On Fri, Aug 1, 2008 at 12:10 AM, Nicolas Capens <nicolas at capens.net> wrote: > I was wondering how to use variations of intrinsic functions that take a > memory operand. Often, for intrinsics where it matters, there's a variant of the intrinsic that takes a pointer operand that you can use, although it looks like there isn't one here. > Take for example the SSE4.1

X86 TRUNCATE cost for AVX & AVX2 mode

2016 Apr 12

X86 TRUNCATE cost for AVX & AVX2 mode

<Copied Cong> Thanks Elena. Mostly I was interested in why such a high cost 30 kept for TRUNCATE v16i32 to v16i8 in SSE41. Looking at the code it appears like TRUNCATE v16i32 to v16i8 in SSE41 is very expensive vs SSE2. I feel this number should be same/close to the cost mentioned for same operation in SSE2ConversionTbl. Below patch from Cong Hou reduce cost for same operation in SSE2

X86 TRUNCATE cost for AVX & AVX2 mode

2016 Apr 11

X86 TRUNCATE cost for AVX & AVX2 mode

Hi, I was going through the X86TTIImpl::getCastInstrCost, and got a doubt on cost calculation for TRUNCATE instruction in AVX mode. In AVX2ConversionTbl & AVXConversionTbl table there is no cost defined for TRUNCATE v16i32 to v16i8, as a fallback it goes to SSE41ConversionTbl table and there it finds cost as 30 for this operation. 30 cost for this operation looks very high. Wondering why

Help with ISEL matching for an SDAG

2016 Jul 29

Help with ISEL matching for an SDAG

I have the following selection DAG: SelectionDAG has 9 nodes: t0: ch = EntryToken t2: i64,ch = CopyFromReg t0, Register:i64 %vreg0 t16: i32,ch = load<LD1[%ptr](tbaa=<0x10023c9f448>), anyext from i8> t0, t2, undef:i64 t15: v16i8 = BUILD_VECTOR t16, t16, t16, t16, t16, t16, t16, t16, t16, t16, t16, t16, t16, t16, t16, t16 t11: ch,glue = CopyToReg t0, Register:v16i8 %V2, t15

[LLVMdev] changing -mattr behavior with mmx and sse

2008 Nov 20

[LLVMdev] changing -mattr behavior with mmx and sse

Hi, When setting -mattr option on X86, I would like to treat MMX separately from SSE levels. This would allow a client who sets the attributes directly to set the SSE level independent of MMX, e.g., llc -march=x86 -mattr=sse41, one would get sse4.1 with mmx disabled while llc -march=x86 -mattr=mmx -mattr=sse42 will get mmx and sse42. If anyone objects to this change, please let me

LLVM and Xeon Skylake v5

2017 May 08

LLVM and Xeon Skylake v5

getProcessTriple just determines operation system, and architecture. It doesn't deal with specific instruction set features. The CPU should be controlled by MCPU on the EngineBuilder i think. The CPU autodetection code lives in getHostCPUName in lib/Support/Host.cpp, but I don't think the JIT calls into. I think its expected the user would call it or pass a specific CPU string to the MCPU

[LLVMdev] branch on vector compare?

2012 Sep 04

[LLVMdev] branch on vector compare?

Roland Scheidegger <sroland <at> vmware.com> writes: > This looks quite similar to something I filed a bug on (12312). Michael > Liao submitted fixes for this, so I think > if you change it to > %16 = fcmp ogt <4 x float> %15, %cr > %17 = sext <4 x i1> %16 to <4 x i32> > %18 = bitcast <4 x i32> %17 to i128 > %19 = icmp ne i128 %18, 0

[LLVMdev] branch on vector compare?

2012 Sep 05

[LLVMdev] branch on vector compare?

Am 05.09.2012 00:24, schrieb Stephen: > Roland Scheidegger <sroland <at> vmware.com> writes: >> This looks quite similar to something I filed a bug on (12312). Michael >> Liao submitted fixes for this, so I think >> if you change it to >> %16 = fcmp ogt <4 x float> %15, %cr >> %17 = sext <4 x i1> %16 to <4 x i32> >> %18 =

LLVM and Xeon Skylake v5

2017 May 08

LLVM and Xeon Skylake v5

Thank you. I'm letting it auto detect by setting the target using getProcessTarget. I disabled avx512 support by passing -avx512f (and the other variants) to setMAttrs on EngineBuilder. I can see refs to avx512 in X86.td. It's the exact same executable running on Kabylake. What does the Cannot select: specifically mean? Is there some table that doesn't have a definition for a key in

[LLVMdev] changing -mattr behavior with mmx and sse

2008 Nov 20

[LLVMdev] changing -mattr behavior with mmx and sse

Might you instead consider just adding a -disable-mmx option? Preston On Thu, 2008-20-11 at 02:57 -0500, Mon Ping Wang wrote: > Hi, > > When setting -mattr option on X86, I would like to treat MMX > separately from SSE levels. This would allow a client who sets the > attributes directly to set the SSE level independent of MMX, e.g., llc > -march=x86 -mattr=sse41, one would get

[LLVMdev] changing -mattr behavior with mmx and sse

2008 Nov 20

[LLVMdev] changing -mattr behavior with mmx and sse

On Nov 19, 2008, at 11:57 PMPST, Mon Ping Wang wrote: > Hi, > > When setting -mattr option on X86, I would like to treat MMX > separately from SSE levels. This would allow a client who sets the > attributes directly to set the SSE level independent of MMX, e.g., llc > -march=x86 -mattr=sse41, one would get sse4.1 with mmx disabled while > llc -march=x86 -mattr=mmx

[LLVMdev] changing -mattr behavior with mmx and sse

2008 Nov 20

[LLVMdev] changing -mattr behavior with mmx and sse

Hi Dale, I will not change the default. I would dislike to see any regressions due to this type of change. -- Mon Ping On Nov 20, 2008, at 10:12 AM, Dale Johannesen wrote: > > On Nov 19, 2008, at 11:57 PMPST, Mon Ping Wang wrote: > >> Hi, >> >> When setting -mattr option on X86, I would like to treat MMX >> separately from SSE levels. This would allow a

[LLVMdev] changing -mattr behavior with mmx and sse

2008 Nov 20

[LLVMdev] changing -mattr behavior with mmx and sse

On Nov 20, 2008, at 8:31 AM, Preston Gurd wrote: > Might you instead consider just adding a -disable-mmx option? I agree, this is a better approach. This distinguishes between capabilities of the chip and the desire to codegen specific vectors one way or another. -Chris > > Preston > > On Thu, 2008-20-11 at 02:57 -0500, Mon Ping Wang wrote: >> Hi, >> >>

x86_64 SSE2/SSE41 optim not used

2014 Mar 11

x86_64 SSE2/SSE41 optim not used

Hi Guys, In stream_decoder.c when assigning lpc restore function, only IA32 processor benefits from SS2 and SSE4.1 optimization. Shouldn't it be the case for x86_64 processor as well ? Thanks, -- Olivier TRISTAN uvi.net -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.xiph.org/pipermail/flac-dev/attachments/20140311/1d49b5c2/attachment.htm

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

2014 Sep 09

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

Hi Chandler, Thanks for fixing the problem with the insertps mask. Generally the new shuffle lowering looks promising, however there are some cases where the codegen is now worse causing runtime performance regressions in some of our internal codebase. You have already mentioned how the new shuffle lowering is missing some features; for example, you explicitly said that we currently lack of

FLAC 1.3.2 has been released

2017 Jan 02

FLAC 1.3.2 has been released

Janne Hyvärinen wrote: > That shouldn't matter. I realise that, but I wanted a better idea about how many people this is like to affect. If you were on Windows XP on some old processor this would probably not affect many people, but since you are on Windows 10 with a Core i7 thats a different matter. Erik -- ---------------------------------------------------------------------- Erik de

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

2014 Sep 08

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

> On Sep 7, 2014, at 8:49 PM, Quentin Colombet <qcolombet at apple.com> wrote: > > Sure, > > Here is the command line: > clang -cc1 -triple x86_64-apple-macosx -S -disable-free -disable-llvm-verifier -main-file-name tmp.i -mrelocation-model pic -pic-level 2 -mdisable-fp-elim -masm-verbose -munwind-tables -target-cpu core-avx-i -O3 -ferror-limit 19 -fmessage-length 114

[LLVMdev] Generating movq2dq using IRBuilder

2008 Jul 31

[LLVMdev] Generating movq2dq using IRBuilder

On Jul 31, 2008, at 7:22 AM, Nicolas Capens wrote: > In the same breath I’d also like to kindly ask if someone could have > a look at the reverse operations, namely trunk from 128 to 64 bit > using movdq2q, and 128 to 32 and 64 to 32 using movd. This also > seems related to Bug 2585. Thanks again. The operations you're describing can be represented as insertelement and

[LLVMdev] Vector cast

2008 Jun 21

[LLVMdev] Vector cast

Hi all, I seem to be unable to cast a vector of integers to a vector of floats (uitofp [4 x i8] to [4 x float], to be exact). It hits an assert in LegalizeDAG.cpp line 5433: "Unknown int value type". The Assembly Language Reference Manual's definition of uitofp doesn't indicate that this is unsupported, so it looks like a bug to me. I'm on an x86 system by the way. My

[LLVMdev] branch on vector compare?

2012 Sep 03

[LLVMdev] branch on vector compare?

> > which goes through memory. Is there some idiom I'm missing so that it would use > > for instance movmsk for SSE or vcmpgt & cr6 for altivec? > > I don't think you are missing anything: LLVM IR has no support for horizontal > operations like or'ing the elements of a vector of boolean together. The code > generators do try to recognize a few idioms and

similar to: [LLVMdev] Using intrinsics with memory operands