thr3ads.net - search: "vstr"

Displaying 19 results from an estimated 19 matches for "vstr".

Did you mean: str

[LLVMdev] MI scheduler produce badly code with inline function

2013 Oct 21

[LLVMdev] MI scheduler produce badly code with inline function

Hi Andy, I'm working on defining new machine model for my target, But I don't understand how to define the in-order machine (reservation tables) in new model. For example, if target has IF ID EX WB stages should I do: let BufferSize=0 in { def IF: ProcResource<1>; def ID: ProcResource<1>; def EX: ProcResource<1>; def WB: ProcResource<1>; } def :

[LLVMdev] LLVM CodeGen Engineer job opening with Apple's compiler team

2011 May 26

[LLVMdev] LLVM CodeGen Engineer job opening with Apple's compiler team

Hi all, LLVM CodeGen and Tools team at Apple is looking for exceptional compiler engineers. This is a great opportunity to work with many of the leaders in the LLVM community. If you are interested in this position, please send your resume / CV and relevant information to evan.cheng at apple.com Thanks, Evan Job description The Apple compiler team is seeking an engineer who is strongly

[LLVMdev] Question about ARM/vfp/NEON code generation

2011 May 27

[LLVMdev] Question about ARM/vfp/NEON code generation

...r7, sp sub sp, sp, #36 str r0, [r7, #-4] vmov s0, r0 str r1, [r7, #-8] vmov s1, r1 str r2, [r7, #-12] vmov s2, r2 vldr.32 s3, [r7, #-4] vldr.32 s4, [r7, #-8] vmul.f32 s3, s3, s4 vstr.32 s3, [r7, #-16] vldr.32 s4, [r7, #-12] vcmpe.f32 s3, s4 vmrs apsr_nzcv, fpscr vstr.32 s0, [sp, #16] vstr.32 s2, [sp, #12] vstr.32 s1, [sp, #8] ble LBB20_2 @ BB#1: @ %bb vldr.32 s0, [r7, #-...

create a data frame with the given column names

2011 Feb 16

create a data frame with the given column names

how do I create a data frame with the given column names _NOT KNOWN IN ADVANCE_? i.e., I have a vector of strings for names and I want to get an _EMPTY_ data frame with these column names. is it at all possible? -- Sam Steingold (http://sds.podval.org/) on CentOS release 5.3 (Final) http://openvotingconsortium.org http://pmw.org.il http://memri.org http://mideasttruth.com

Suggentions on modeling a micro architecture with per-operand machine model

2018 Feb 26

Suggentions on modeling a micro architecture with per-operand machine model

Hi everyone, I would like to know how to model an instruction waiting a pipeline unit to be empty for cycles. For example, I have a vstr that waits FP pipelines to be empty for at most 3 cycles. I set FP instructions use a resource unit called FPPipe with resourceCycle=3 and vstr use FPPipe with resourceClycle=0. So scheduler will know a vstr will wait 3 cycle if it is scheduled right after a FP instruction. However, this way will...

[LLVMdev] MI scheduler produce badly code with inline function

2013 Oct 15

[LLVMdev] MI scheduler produce badly code with inline function

On Oct 14, 2013, at 3:27 AM, Zakk <zakk0610 at gmail.com> wrote: > Hi all, > I meet this problem when compiling the TREAM benchmark (http://www.cs.virginia.edu/stream/FTP/Code/) with enable-misched > > The small function will be scheduled as good code, but if opt inline this function, the inline part will be scheduled as bad code. A bug for this is welcome. Pretty soon, I’ll

[LLVMdev] MI scheduler produce badly code with inline function

2013 Oct 14

[LLVMdev] MI scheduler produce badly code with inline function

Hi all, I meet this problem when compiling the TREAM benchmark ( http://www.cs.virginia.edu/stream/FTP/Code/) with enable-misched The small function will be scheduled as good code, but if opt inline this function, the inline part will be scheduled as bad code. so I rewrite a simple code as attached link (foo.c), and compiled with two different methods: *method A:* *$clang -O3 foo.c -static -S

[RFC PATCH v2] armv7: celt_pitch_xcorr: Introduce ARM neon intrinsics

2014 Dec 09

[RFC PATCH v2] armv7: celt_pitch_xcorr: Introduce ARM neon intrinsics

...vtrn_f32(tv.val[0], ZERO); > + tv.val[0] = vadd_f32(tv.val[0], tv.val[1]); > + > + vst1_lane_f32(&sumi, tv.val[0], 0); Accessing tv.val[0] and tv.val[1] directly seems to send these values through the stack, e.g., f4: f3ba7085 vtrn.32 d7, d5 f8: ed0b7b0f vstr d7, [fp, #-60] fc: ed0b5b0d vstr d5, [fp, #-52] ... 114: ed1b6b09 vldr d6, [fp, #-36] 118: ed1b7b0b vldr d7, [fp, #-44] 11c: f2077d06 vadd.f32 d7, d7, d6 120: f483780f vst1.32 {d7[0]}, [r3] Can't you just use float32...

[LLVMdev] Vectors in structures

2010 Sep 21

[LLVMdev] Vectors in structures

...ector types are considered proper types in LLVM, why pack them inside structures? That results in a lot of boilerplate code for converting and copying the values (about 20 lines of IR) just to call a NEON instruction that, in the end, will be converted into three instructions: VLDR + {whatever} + VSTR If the load and store are normally performed by one operation (I assume it's the same on Intel and others), why bother with the structure passing instead of just using load/store for vector types? Also, the extra struct { [i8 x 8] } for memcopy seems also redundant. If you're explicitly t...

[LLVMdev] MI scheduler produce badly code with inline function

2013 Oct 16

[LLVMdev] MI scheduler produce badly code with inline function

...lr} movw r12, :lower16:c movw lr, :lower16:b movw r3, #9216 movt r12, :upper16:c mov r1, #0 vmov.f64 d16, #3.000000e+00 movt lr, :upper16:b movt r3, #244 .LBB0_1: add r0, r12, r1 add r2, lr, r1 *vldr d17, [r0]* add r1, r1, #32 vmul.f64 d17, d17, d16 cmp r1, r3 vstr d17, [r2] * vldr d17, [r0, #8]* vmul.f64 d17, d17, d16 * * vstr d17, [r2, #8] * vldr d17, [r0, #16]* vmul.f64 d17, d17, d16 vstr d17, [r2, #16] * vldr d17, [r0, #24]* vmul.f64 d17, d17, d16 vstr d17, [r2, #24] bne .LBB0_1 pop {lr} bx lr .Ltmp0: Using Itinerary will ge...

[LLVMdev] JIT on armhf, again

2014 Jul 23

[LLVMdev] JIT on armhf, again

On 7/23/14, 3:30 PM, Tim Northover wrote: [...] > It looks like it's a case of calling Module::setTargetTriple. As with > most JIT setup questions, though, often the best way to find out is to > get something working in lli and then look at what it does > (tools/lli/lli.cpp). Well, it's *almost* working --- hardfloat code is now being generated, and it even seems to be right

[LLVMdev] JIT on armhf, again

2014 Jul 24

[LLVMdev] JIT on armhf, again

On 7/24/14, 7:18 AM, Tim Northover wrote: [...] > Which triple are you using? And is the correct code used when you run > the same IR through "llc -mtriple=whatever"? armv7-linux-gnueabihf, as suggested; and if I use llc -mtriple then the code compiles to: vstr s0, [r0] bx lr ...which I would consider correct. (What's more interesting is *without* specifying the triple llc generates armel code. Should llc default to generating code which will actually run on a given platform? Is it possible my version of llvm has been compiled with the wrong option...

[LLVMdev] Vectors in structures

2010 Sep 21

[LLVMdev] Vectors in structures

...ype of arguments, but that is not yet implemented. > > That results in a lot of boilerplate code for converting and copying > the values (about 20 lines of IR) just to call a NEON instruction > that, in the end, will be converted into three instructions: > > VLDR + {whatever} + VSTR > > If the load and store are normally performed by one operation (I > assume it's the same on Intel and others), why bother with the > structure passing instead of just using load/store for vector types? As you noted, the struct wrappers produce a lot of extra code but it should...

[LLVMdev] question about alignment of structures on the stack (arm 32)

2015 Apr 20

[LLVMdev] question about alignment of structures on the stack (arm 32)

Dear community, I faced with code which was generated by llvm, assembly instructions of that code is relying on 8-bytes alignment for structures on the stack. The part of Objective C code is following: -(void)getCharacters:(unichar *)unicode { NSRange range; range.location = 0; range.length = [self length]; printf("%p, %p\n", &range.location, &range.length); And

[RFC PATCH v2] cover: armv7: celt_pitch_xcorr: Introduce ARM neon intrinsics

2014 Dec 07

[RFC PATCH v2] cover: armv7: celt_pitch_xcorr: Introduce ARM neon intrinsics

Hi, Optimizes celt_pitch_xcorr for floating point. Changes from RFCv1: - Rebased on top of commit aad281878: Fix celt_pitch_xcorr_c signature. which got rid of ugly code around CELT_PITCH_XCORR_IMPL passing of "arch" parameter. - Unified with --enable-intrinsics used by x86 - Modified algorithm to be more in-line with algorithm in celt_pitch_xcorr_arm.s Viswanath Puttagunta

[LLVMdev] question about alignment of structures on the stack (arm 32)

2015 Apr 21

[LLVMdev] question about alignment of structures on the stack (arm 32)

...ix ----- And we get following code of assembler language: main: push {r11, lr} mov r11, sp sub sp, sp, #24 mov r0, #0 str r0, [r11, #-4] add r1, sp, #8 movw r2, :lower16:.Lmain.mStruct movt r2, :upper16:.Lmain.mStruct vldr d16, [r2] vstr d16, [sp, #8] orr r2, r1, #4 movw r3, :lower16:.L.str movt r3, :upper16:.L.str str r0, [sp, #4] mov r0, r3 bl printf ldr r1, [sp, #4] str r0, [sp] mov r0, r1 mov sp, r11 pop {r11, pc} r2 populates by r1 plus 4 (but plu...

[LLVMdev] Thumb-2 code generation error in Apple LLVM at all optimization levels

2011 Nov 12

[LLVMdev] Thumb-2 code generation error in Apple LLVM at all optimization levels

...24_11: add r1, pc ldr r1, [r1] blx _objc_msgSend str r0, [sp, #28] .loc 1 399 69 ldr r0, [sp, #28] ldr r1, [sp, #40] ldr.n r2, LCPI24_10 LPC24_10: add r2, pc ldr r2, [r2] add r1, r2 ldr r2, [r1] ldr.n r1, LCPI24_9 LPC24_9: add r1, pc ldr r1, [r1] blx _objc_msgSend vmov d16, r0, r1 vstr.64 d16, [sp, #16] .loc 1 401 2 ldr r0, [sp, #40] ldr.n r1, LCPI24_8 LPC24_8: add r1, pc ldr r1, [r1] add r0, r1 ldr r0, [r0] ldr.n r1, LCPI24_7 LPC24_7: add r1, pc ldr r1, [r1] blx _objc_msgSend .loc 1 402 2 ldr r0, [sp, #28] ldr.n r1, LCPI24_6 LPC24_6: add r1, pc ldr r1, [r1] blx...

[LLVMdev] [global-isel] Proposal for a global instruction selector

2013 Aug 08

[LLVMdev] [global-isel] Proposal for a global instruction selector

...if f64 is a legal type, so is i64, v2f32, and even v64i1. On the ARM target, for example, these types would be legal: All 8-bit types via ldrb/strb to GPR. (i8, v1i8, v2i4, v4i2, v8i1) All 16-bit types via ldrh/strh to GPR. (i16, f16, v1i16, v2i8, ...) All 32-bit types via ldr/str to GPR and vldr/vstr to SPR. All 64-bit types via ldrd/strd to GPRPair and vldr/vstr to DPR. All 128-bit types via vld1/vst1 to DPair. All 192-bit types via vld1/vst1 to DTriple. All 256-bit types via vld1/vst1 to DQuad. This larger set of legal types also makes it easier to handle things like extractelement <8 x i8...

[LLVMdev] DebugInfo: Missing non-trivially-copyable parameters in SelectionDAG

2013 Jun 24

[LLVMdev] DebugInfo: Missing non-trivially-copyable parameters in SelectionDAG

...st-isel-conversion.ll --check-prefix=THUMB -- Exit Code: 1 Command Output (stderr): -- /usr/local/google/home/blaikie/dev/llvm/src/test/CodeGen/ARM/fast-isel-conversion.ll:23:8: error: expected string not found in input ; ARM: sitofp_single_i16 ^ <stdin>:43:2: note: scanning from here vstr s0, [sp] ^ <stdin>:113:10: note: possible intended match here .globl _uitofp_single_i16 ^ -- ******************** FAIL: LLVM :: CodeGen/R600/store.ll (9 of 51) ******************** TEST 'LLVM :: CodeGen/R600/store.ll' FAILED ******************** Script: -- /usr/local/googl...

search for: vstr