thr3ads.net - search: "sse4"

Displaying 20 results from an estimated 73 matches for "sse4".

Did you mean: sse

2013 Oct 25

[LLVMdev] Bug #16941

Nadav, The problem appears only for vectors longer than available hardware register (in doubleword elements, i.e. more than 4 on SSE4 and more than 8 on AVX). Select does weird thing. <8 x i1> mask comes as two XMM registers, select converts them to a single XMM registers (i.e. 8 x 16 bit), immediately after it converts back to two XMM registers and does blend. Conversion forth and back has huge overhead. I'm attaching...

[LLVMdev] Bug #16941

2013 Oct 21

[LLVMdev] Bug #16941

...or b) code generation for this instruction combination should be tuned. This should benefit LLVM in general IMHO. It also may be the case that this just leads to the bad code only in our specific environment, but at this point it doesn't seems to be the case. I'll try to come up with small SSE4 reproducer. By the way, I'm curious, is the any reason why you focus on SSE4, not AVX? Seems that vectorizer should care the most about the latest silicon. Dmitry. On Mon, Oct 21, 2013 at 10:18 PM, Nadav Rotem <nrotem at apple.com> wrote: > Hi Dmitry, > > ISPC does some inst...

bad identification of the CPU pentium dual core ( penryn instead of core2 )

2015 Oct 21

bad identification of the CPU pentium dual core ( penryn instead of core2 )

lvm 3.7.0 treats pentium dual core ( cpu family 6 model 23 ) as "penryn" cpu, which triggers a serious bug : - crashs in openGL programs when llvm is used by mesa package, llvm will produces binary code with SSE4 instructions, which is not compatible with pentium dual core, because this CPU doesn't support SSE4 instructions ( bad cpu opcodes ), with llvm 3.6.2 this bug doesn't occur because pentium dual core was treated as "core2" cpu, which is the good behaviour, the llvm git commit who...

[LLVMdev] Bug #16941

2013 Oct 22

[LLVMdev] Bug #16941

On Oct 21, 2013, at 12:09 PM, Dmitry Babokin <babokin at gmail.com> wrote: > By the way, I'm curious, is the any reason why you focus on SSE4, not AVX? Seems that vectorizer should care the most about the latest silicon. > I am interested in looking at the SSE4 code because lowering of AVX code is more complicated, especially for masks. The problem that <8 x i1> can be legalized to <8 x i32> for YMM, or <8 x i16>...

[LLVMdev] Bug #16941

2013 Oct 26

[LLVMdev] Bug #16941

...that case ISPC should generate two vectors operations. Thanks, Nadav On Oct 25, 2013, at 2:16 PM, Dmitry Babokin <babokin at gmail.com> wrote: > Nadav, > > The problem appears only for vectors longer than available hardware register (in doubleword elements, i.e. more than 4 on SSE4 and more than 8 on AVX). Select does weird thing. <8 x i1> mask comes as two XMM registers, select converts them to a single XMM registers (i.e. 8 x 16 bit), immediately after it converts back to two XMM registers and does blend. Conversion forth and back has huge overhead. > > I'm...

[LLVMdev] Bug #16941

2013 Oct 26

[LLVMdev] Bug #16941

...; vectors operations. > > Thanks, > Nadav > > > On Oct 25, 2013, at 2:16 PM, Dmitry Babokin <babokin at gmail.com> wrote: > > Nadav, > > The problem appears only for vectors longer than available hardware > register (in doubleword elements, i.e. more than 4 on SSE4 and more than 8 > on AVX). Select does weird thing. <8 x i1> mask comes as two XMM registers, > select converts them to a single XMM registers (i.e. 8 x 16 bit), > immediately after it converts back to two XMM registers and does blend. > Conversion forth and back has huge overhead...

[LLVMdev] Bug #16941

2013 Oct 21

[LLVMdev] Bug #16941

Nadav, You are absolutely right, it's ISPC workload. I've checked SSE4 and it's also severely affected. We use intrinsics only for conversion <N x i32> <=> i32, i.e. movmsk.ps. For the rest we use general LLVM instructions. And I actually would really like to stick this way. We rely on LLVM's ability to produce efficient code from general LLVM IR....

[LLVMdev] Bug #16941

2013 Oct 21

[LLVMdev] Bug #16941

Hi Dmitry, ISPC does some instruction selection as part of vectorization (on ASTs!) by placing intrinsics for specific operations. The SEXT to i32 pattern was implemented because LLVM did not support vector-selects when this code was written. Can you submit a small SSE4 test case that demonstrates the problem? Select is the canonical form of this operations, and SEXT is usually more difficult to lower. Thanks, Nadav On Oct 21, 2013, at 11:12 AM, Dmitry Babokin <babokin at gmail.com> wrote: > Nadav, > > You are absolutely right, it's ISP...

New x86-64 micro-architecture levels

2020 Jul 23

New x86-64 micro-architecture levels

...2020, Mallappa, Premachandra wrote: > > That's deliberate, so that we can use the same x86-* names for 32-bit library selection (once we define matching micro-architecture levels there). > > Understood. > > > If numbers are out, what should we use instead? > > x86-sse4, x86-avx2, x86-avx512? Would that work? > > Yes please, I think we have to choose somewhere, above would be more > descriptive And IMHO that's exactly the problem. These names should _not_ be descriptive, because any description invokes a wrong feeling of precision. E.g. what F...

[LLVMdev] "target-features" and "target-cpu" attributes

2013 Oct 03

[LLVMdev] "target-features" and "target-cpu" attributes

...re introduced. As I understand, they are intended to support generation of "fat binaries" (binaries with functions generated for different CPUs), particularly to support LTO compilation, when different source files have different targets (say, one of files should support SSE2, another one SSE4). Please correct me if I'm wrong in this assumptions. My attempts to utilize this feature fail (I generate LLVM IR directly, I'm not using clang) and this looks very similar to the one described by Benjamin in this mail thread: http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20...

[LLVMdev] SSE examples

2009 Jun 22

[LLVMdev] SSE examples

...at the best way to deal with this in LLVM would be, someone else may have a better idea. as for what targets support which operations, in the case of SSE, go check the Intel and AMD64 docs. it can be noted that most processors around now support SSE2, but not as many support newer (SSE3/SSSE3, SSE4, ...). note that Intel and AMD have had a split over the issue: Intel implements SSE3 and SSE4; AMD implements parts of SSE3 and SSE4, but not other parts; AMD is implementing SSE5, but it uses instructions which Intel does not use; ... so, SSE2 is fairly safe at this point, but much newer is an...

[LLVMdev] "target-features" and "target-cpu" attributes

2013 Oct 10

[LLVMdev] "target-features" and "target-cpu" attributes

...inaries I mean the binary, where some functions are compiled for one flavor of x86, while others are compiled for another flavor of x86. I care about the usage model, which is important for LTO - a dispatch function (compiled for the least common denominator) + plus set of specialized functions for sse4, avx ,avx2 and etc., which are called by dispatch function depending on runtime cpu id check. lipo may help achieving this on Darwin, but it's not exactly what I need. I need a solution suitable for LTO. Actually lipo may work for me as a workaround, but I need cross platform solution. The cu...

[LLVMdev] SSE examples

2009 Jun 21

[LLVMdev] SSE examples

Does anyone have any LLVM IR examples implementing things using the instructions for SSE, like complex arithmetic or 3D vector-matrix stuff? I'd like to have HLVM use them "under the hood" for some things but I cannot see all of the operations that I was expecting (e.g. dot product) and am not sure what works when (e.g. "Not all targets support all types however."). --

[LLVMdev] I'm new to LLVM

2008 May 17

[LLVMdev] I'm new to LLVM

...vm is. i think it said somewhere on the site that it's not a language, it's just used for creating languages. but the people at c-- point me here. so i just want to code in assembler with perhaps some higher-level constructs. will llvm let me do this? also, does llvm support simd up to sse4? and does it have a framework for making windows dll's? (python extensions ftw) can it do coroutines or microthreads? thx for the help -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20080517/e857d88e/...

[LLVMdev] "target-features" and "target-cpu" attributes

2013 Oct 09

[LLVMdev] "target-features" and "target-cpu" attributes

...re introduced. As I understand, they are intended to support generation of "fat binaries" (binaries with functions generated for different CPUs), particularly to support LTO compilation, when different source files have different targets (say, one of files should support SSE2, another one SSE4). Please correct me if I'm wrong in this assumptions. > > My attempts to utilize this feature fail (I generate LLVM IR directly, I'm not using clang) and this looks very similar to the one described by Benjamin in this mail thread: http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week...

[LLVMdev] "target-features" and "target-cpu" attributes

2013 Oct 11

[LLVMdev] "target-features" and "target-cpu" attributes

...inary, where some > functions are compiled for one flavor of x86, while others are compiled for > another flavor of x86. I care about the usage model, which is important for > LTO - a dispatch function (compiled for the least common denominator) + > plus set of specialized functions for sse4, avx ,avx2 and etc., which are > called by dispatch function depending on runtime cpu id check. > > > Okay. The terminology was a bit overloaded. :-) > > > lipo may help achieving this on Darwin, but it's not exactly what I > need. I need a solution suitable for LTO. Act...

[LLVMdev] folding x * 0 = 0

2010 Mar 03

[LLVMdev] folding x * 0 = 0

On Wednesday 03 March 2010 15:38:06 Chris Lattner wrote: > > Signalling NaN is one case. I'm sure there are others. > > The only other thing I could imagine that it is useful for is for rounding > mode control. Yep. > IMO rounding mode should be explicitly marked on the > instruction as well. That would also be useful for some GPUs where each instruction can specify

[LLVMdev] folding x * 0 = 0

2010 Mar 03

[LLVMdev] folding x * 0 = 0

On Mar 3, 2010, at 1:53 PM, David Greene wrote: >> >> IMO rounding mode should be explicitly marked on the >> instruction as well. > > That would also be useful for some GPUs where each instruction can specify > its own rounding mode. SSE4 also has this for at least one conversion instruction. -Chris

[LLVMdev] GCC Merge Coming Up

2008 Mar 18

[LLVMdev] GCC Merge Coming Up

...ming up. If all goes well, this should happen tomorrow (Tuesday). I'll have Tanya disable emails when I apply the patch so that you aren't inundated with a bunch of messages. But I will send out the patch afterwards as an attachment. One of the main reasons for this merge is to get SSE4 support into LLVM-GCC -- Nate Begeman already started adding support for SSE4 on the code-gen side -- as well as testsuite improvements and other general bug fixes. This merge should go *much* more smoothly than the last merge -- it could hardly be worse, right? ;-) I already did a test co...

New x86-64 micro-architecture levels

2020 Jul 21

New x86-64 micro-architecture levels

...g for glibc to load libraries found in that directory just because a system has AVX2 support, because the libraries might also need FMA, based on the Level C definition). On the GCC side, it avoids a confusion between -mavx2 and -march=x86-avx2. If numbers are out, what should we use instead? x86-sse4, x86-avx2, x86-avx512? Would that work? >> * Level A > ... >> * Level B >> This step is so small that it probably can be dropped, unless the benefits from using VEX encoding are truly significant. > > Yes, Agree, the delta is too small, can be clubbed into A or C. Let&...

search for: sse4