similar to: [LLVMdev] problem with X86's AVX assembler?

Displaying 20 results from an estimated 400 matches similar to: "[LLVMdev] problem with X86's AVX assembler?"

2014 Jun 26
2
[LLVMdev] problem with X86's AVX assembler?
On Thu, Jun 26, 2014 at 5:47 AM, Adam Nemet <anemet at apple.com> wrote: > Hi Jun, > > On Jun 25, 2014, at 8:14 AM, Jun Koi <junkoi2004 at gmail.com> wrote: > > > Hi, > > > > I am trying to assemble below instruction with latest LLVM code, but > fail. Am I doing something wrong, or is this a bug? > > > > > > $ echo "vaddps zmm7
2014 Jun 26
2
[LLVMdev] problem with X86's AVX assembler?
On Thu, Jun 26, 2014 at 10:23 AM, Adam Nemet <anemet at apple.com> wrote: > > > On Jun 25, 2014, at 7:05 PM, Jun Koi <junkoi2004 at gmail.com> wrote: > > > > > On Thu, Jun 26, 2014 at 5:47 AM, Adam Nemet <anemet at apple.com> wrote: > >> Hi Jun, >> >> On Jun 25, 2014, at 8:14 AM, Jun Koi <junkoi2004 at gmail.com> wrote: >>
2017 Jul 01
2
KNL Assembly Code for Matrix Multiplication
Thank You, It means vmovdqa64 zmm22, zmmword ptr [rip + .LCPI0_0] # zmm22 = [8,9,10,11,12,13,14,15] zmm22 will contain 64 bit constant values which are indexes here zmm22=8, 9, 10, 11, 12,13,14,15. not the values loaded from these locations. and zmm2 contains constant 4000. so, vpmuludq zmm14, zmm10, zmm2 ; will multiply the indexes values with 4000, as for array b the stride is 4000. zmm14=
2019 Sep 02
3
AVX2 codegen - question reg. FMA generation
Hello, On the appended reasonably simple test case that has an fmul/fadd sequence on <8 x float> vector types, I don't see the x86-64 code generator (with cpu set to haswell or later types) turning it into an AVX2 FMA instructions. Here's the snippet in the output it generates: $ llc -O3 -mcpu=skylake --------------------- .LBB0_2: # =>This Inner
2012 May 24
4
[LLVMdev] use AVX automatically if present
I wonder why AVX is not used automatically if available at the host machine. In contrast to that, SSE41 instructions (like pmulld) are automatically used if the host machine supports SSE41. E.g. $ cat avx.ll define void @_fun1(<8 x float>*, <8 x float>*) { _L1: %x = load <8 x float>* %0 %y = load <8 x float>* %1 %z = fadd <8 x float> %x, %y store
2012 May 24
0
[LLVMdev] use AVX automatically if present
On Thu, 24 May 2012, Pan, Wei wrote: > Very likely AVX is not enabled in your llc. This feature was enabled > just recently (late of April). I forgot to mention that I am using recent LLVM-3.1 and in principle my llc knows about avx as I have shown in the second example. But avx does not seem to be used by default. On Thu, 24 May 2012, Henning Thielemann wrote: > $ llc -o - -mattr
2012 Jan 10
0
[LLVMdev] Calling conventions for YMM registers on AVX
This is the wrong code: declare <16 x float> @foo(<16 x float>) define <16 x float> @test(<16 x float> %x, <16 x float> %y) nounwind { entry: %x1 = fadd <16 x float> %x, %y %call = call <16 x float> @foo(<16 x float> %x1) nounwind %y1 = fsub <16 x float> %call, %y ret <16 x float> %y1 } ./llc -mattr=+avx
2005 Feb 23
2
Creating extension groups
Hi I want to create 2 groups of extensions, for example group 1 can't make outgoing calls they can only call other extensions and extensions of group 2. group 2 can call any of the extensions + they can make out going calls using our SIP server. Please let me know how to do this. I was going through the docs and I sae that I have to specify a group in zapta.conf , this is not clear please
2017 Jun 06
3
Tablas en R
Hola doblett Creo qeu el paquete stargazer te puede ayudar: https://cran.r-project.org/web/packages/stargazer/index.html http://jakeruss.com/cheatsheets/stargazer.html Un saludo. El 6 de junio de 2017, 15:17, Álvaro Hernández <alvarohv en um.es> escribió: > Hola, doblett: > > Yo utilizo normalmente el paquete 'tables' que tiene una documentación > bastante buena.
2009 Jun 12
1
Can't get F77_CALL(dgemm) to work [SEC=UNCLASSIFIED]
Hi I am new to writing C code and am trying to write an R extension in C. I have hit a wall with F77_CALL(dgemm) in that it produces wrong results. The code below is a simplified example that multiplies the matrices Ab and Bm to give Cm. The results below show clearly that Cm is wrong. Am= 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Bm= 1 1 1 1 1
2017 Jan 13
3
input en markdown
Hola lista: Una duda rápida (espero). ¿Se puede hacer un "inlcude" o un "input" en markdown? sin emplear Rmarkdown o knitr.... solo puro markdown, de modo que por ejemplo github lo interprete o un visor básico de markdown. la idea es q un file muestre(contenga) todo pero tener la info distribuida en varios files. Gracias. -- Antonio Maurandi López Sec. Apoyo
2009 Dec 17
1
[LLVMdev] Merging AVX
I'd like to start moving some of our AVX work into trunk. We've got quite a bit of it implemented already. I wanted to make sure we got something that would work and remain relatively stable. Here's how I'd like to do this. First, I have some more TableGen fixes and enhancements that are prereqs for AVX templates. I'd like to get those in first. Then there are a number of
2015 Jul 01
3
[LLVMdev] SLP vectorizer on AVX feature
Hi Frank, What does --debug-only=vectorize says? You may try to get the datalayout and the triple on the IR header, just to make sure you got everything right. LLVM will honour those, and front-ends should create them correctly. --renato On 1 July 2015 at 19:06, Frank Winter <fwinter at jlab.org> wrote: > I realized that the function parameters had no alignment attributes on them.
2013 Dec 12
0
[LLVMdev] AVX code gen
It probably does not pick the right processor architecture. You could try “clang -mavx” or “clang -march=corei7-avx” for ivy-bridge and “clang -march=core-avx2” or “clang -mavx2" for haswell. $ clang -march=core-avx2 -O3 -S -o - test.c .section __TEXT,__text,regular,pure_instructions .globl _f .align 4, 0x90 _f: ## @f
2015 Jul 01
3
[LLVMdev] SLP vectorizer on AVX feature
Frank, It sounds like the SLP vectorizer thinks that it is more profitable to use 128bit wide operations (because 256bit operations are double pumped on Sandybridge). Did you see a different result on Haswell? Thanks, Nadav > On Jul 1, 2015, at 11:06 AM, Frank Winter <fwinter at jlab.org> wrote: > > I realized that the function parameters had no alignment attributes on them.
2016 Oct 26
2
borrar texto en una gráfica
Hola a todos, Os envío una consulta que considero sencilla pero me está resultando imposible de resolver. Si ejecutáis el siguiente código, obtendréis la gráfica que os adjunto: library(ltm) modelo <- rasch(LSAT) plot(modelo, main="Curva probabilidad pregunta 1",legend = TRUE, cx = "bottomright", items=1,xlab="Conocimiento",ylab="Probabilidad") Resulta
2017 Aug 06
2
VBROADCAST Implementation Issues
i want to implement gather for v64i32. i wrote following code. def GATHER_256B : I<0x68, MRMSrcMem, (outs VR_2048:$dst), (ins i2048mem:$src), "GATHER_256B\t{$src, $dst|$dst, $src}", [(set VR_2048:$dst, (v64i32 (masked_gather addr:$src)))], IIC_MOV_MEM>, TA; def: Pat<(v64f32 (masked_gather addr:$src)), (GATHER_256B
2013 Oct 02
0
Fwd: Seminario sobre estimacion para areas pequeñas. "Small Area estimation"
Estimados compañeros Escribo para anunciar en la lista que el martes de la semana que viene (8 de Octubre) a la *13:00 h. *tendrá lugar en el *Instituto de Ciencias Matemáticas del CSIC-* *ICMAT (Universidad Autónoma, Cantoblanco,ver ubicacion <http://www.icmat.es/facilities/howtoarrive>) (Aula Gris 1), Madrid* un seminario sobre "Estimacion para areas pequeñas" ( "Small
2017 Aug 07
2
VBROADCAST Implementation Issues
Hello, I did as you said, Please tell me whether the following correct now?? def GATHER_256B : I<0x68, MRMSrcMem, (outs VR_2048:$dst, _.KRCWM:$mask_wb), (VR_2048:$src1, _.KRCWM:$mask, ins i2048mem:$src2), "GATHER_256B\t{$src2, {$dst}{${mask}}|${dst} {${mask}}, $src2}"), [(set VR_2048:$dst, _.KRCWM:$mask_wb, (v64i32 (GatherNode
2017 Mar 07
0
Potential clue for Bug 16975 - lme fixed sigma - inconsistent REML estimation
Dear list, I was trying to create a VarClass for nlme to work with Fay-Herriot (FH) models. The idea was to create a modification of VarComb that instead of multiplying the variance functions made their sum (I called it varSum). After some fails etc... I found that the I was not getting the expected results because I needed to make sigma fixed. Trying to find how to make sigma fixed I run into