thr3ads.net - search: "16b"

Displaying 20 results from an estimated 108 matches for "16b".

Did you mean: 16

2007 Oct 02

plot question

Hello, I have a question about how to plot a series of data. The folloqing is my data matrix of n > n 25p 5p 2.5p 0.5p 16B-E06.g 45379 4383 5123 45 16B-E06.g 45138 4028 6249 52 16B-E06.g 48457 4267 5470 54 16B-E06.g 47740 4676 6769 48 37B-B02.g 42860 6152 19276 72 35B-A02.g 48325 12863 38274 143 35B-A02.g 48410 12806 39013 175 35B-A02.g 48417 9057 40923 176 35B-A02.g 51403 13865 43338 161 45B-C1...

[Aarch64 v2 05/18] Add Neon intrinsics for Silk noise shape quantization.

2015 Nov 23

[Aarch64 v2 05/18] Add Neon intrinsics for Silk noise shape quantization.

...not litter it with #ifdef?s unless there?s a large difference between the platforms. It looks like Clang (the version in Xcode 7.1.1, at least) is smart enough to optimize the first two operations you mention, figuring out sshll2 and smlal2 properly, though the third causes a gratuitous extra ?ext.16b? to be generated. I?ve filed a missed-optimization bug on Clang for the latter. Here?s the code it generates: _silk_NSQ_noise_shape_feedback_loop_neon: 000000000000004c ldr w9, [x0] 0000000000000050 cmp w3, #8 0000000000000054 b.ne 0x9c 0000000000000058 d...

[LLVMdev] Testcases where GVN uses too much memory?

2014 May 03

[LLVMdev] Testcases where GVN uses too much memory?

I've heard a few times, "GVN uses too much memory." The real fix is probably a rewrite of some sort, but that's not what this email is about. I have a few patches that should *incrementally* reduce its memory usage. Keyword being "should", because I haven't observed an improvement... in fact, I haven't seen it using much memory at all. Does anyone have a

[LLVMdev] Missing optimization - constant parameter

2013 Aug 02

[LLVMdev] Missing optimization - constant parameter

...ll = tail call i64 @xtr(i64 12345123400) #2 ret i64 %call } Which is probably the best representation to have at this relatively high level. At the machine level it looks like it is the register coalescer that is duplicating the constant. It transforms 0B BB#0: derived from LLVM BB %entry 16B %vreg0<def> = MOV64rm %RIP, 1, %noreg, <ga:@val>[TF=5], %noreg; mem:LD8[GOT] GR64:%vreg0 32B %vreg1<def> = MOV64rm %RIP, 1, %noreg, <ga:@p>[TF=5], %noreg; mem:LD8[GOT] GR64:%vreg1 48B MOV64mr %vreg1, 1, %noreg, 0, %noreg, %vreg0; mem:ST8[@...

[LLVMdev] X86TargetLowering::LowerToBT

2015 Jan 22

[LLVMdev] X86TargetLowering::LowerToBT

Yeah, the alternative is to do movabs and then test, which is doable but I’m not sure if it’s worth it (surely BT + risk of flags merging penalty has to be better than two ops, one of which is ~9-10 bytes). Fiona > On Jan 22, 2015, at 2:59 PM, Chris Sears <chris.sears at gmail.com> wrote: > > My bad on that. So that's what the comment meant. > That means BT is pretty much

[LLVMdev] Missing optimization - constant parameter

2013 Aug 02

[LLVMdev] Missing optimization - constant parameter

For the little C test program where a constant is stored in memory and also used as a parameter: #include <stdint.h> uint64_t val, *p; extern uint64_t xtr( uint64_t); uint64_t caller() { uint64_t x; p = &val; x = 12345123400L; *p = x; return xtr(x); } clang (3.2, 3.3 and svn) generates the following X86 code (at -O3): caller: movq

RFC: Should SmallVectors be smaller?

2018 Jun 21

RFC: Should SmallVectors be smaller?

I've been curious for a while whether SmallVectors have the right speed/memory tradeoff. It would be straightforward to shave off a couple of pointers (1 pointer/4B on 32-bit; 2 pointers/16B on 64-bit) if users could afford to test for small-mode vs. large-mode. The current scheme works out to something like this: ``` template <class T, size_t SmallCapacity> struct SmallVector { T *BeginX, *EndX, *CapacityX; T Small[SmallCapacity]; bool isSmall() const { return BeginX ==...

[LLVMdev] Missing optimization - constant parameter

2013 Aug 02

[LLVMdev] Missing optimization - constant parameter

...ret i64 %call > } > > Which is probably the best representation to have at this relatively high level. > > At the machine level it looks like it is the register coalescer that > is duplicating the constant. It transforms > > 0B BB#0: derived from LLVM BB %entry > 16B %vreg0<def> = MOV64rm %RIP, 1, %noreg, > <ga:@val>[TF=5], %noreg; mem:LD8[GOT] GR64:%vreg0 > 32B %vreg1<def> = MOV64rm %RIP, 1, %noreg, <ga:@p>[TF=5], > %noreg; mem:LD8[GOT] GR64:%vreg1 > 48B MOV64mr %vreg1, 1, %noreg, 0, %nore...

Porting Vorbis lib on Ti DSP ? How to ?

2002 Apr 09

Porting Vorbis lib on Ti DSP ? How to ?

...mpile the original library. Will it be really possible without too much painfull work ? Does any body have ideas of how to do that and about benchmarks for mips and memory on embedded platform ? What I see for straight data format assignement : Standard C C54x Compiler char 8b 16b short,ushort 16b 16b int,uint 32b 16b long,ulong 32b 32b float 32b 32b double 64b 32b long double 64b 32b So I will have to do int(std C) -> float ( DSP C) double(std C) -> float ( DSP C) or even create special routines to handle bigger data ( 64 b). For benchmarking, I have seen:...

[LLVMdev] X86TargetLowering::LowerToBT

2015 Jan 24

[LLVMdev] X86TargetLowering::LowerToBT

...ether the -Oz (optimize for size) flag is set or whether the containing function's PGO cold attribute is set. If either are true it emits BT for tests of bits 8-31 instead of TEST. Previously, TEST was always used for bits 0-31 and BT was always used for bits 32-63. Since the BT instruction is 16b smaller than TEST for the bits 8-31 case, 32b vs 48b, and not irredeemably slower, it makes sense to use BT in cases where size matters. Similar logic is possible for BTC and BTS. However, LowerToBTC and LowerToBTS would need to be written and used and that's a larger patch. -------------- nex...

[Aarch64 00/11] Patches to enable Aarch64

2015 Nov 20

[Aarch64 00/11] Patches to enable Aarch64

> On Nov 19, 2015, at 5:47 PM, John Ridges <jridges at masque.com> wrote: > > Any speedup from the intrinsics may just be swamped by the rest of the encode/decode process. But I think you really want SIG2WORD16 to be (vqmovns_s32(PSHR32((x), SIG_SHIFT))) Yes, you?re right. I forgot to run the vectors under qemu with my previous version (oh, the embarrassment!) Fixed forthcoming

[LLVMdev] X86TargetLowering::LowerToBT

2015 Jan 22

[LLVMdev] X86TargetLowering::LowerToBT

On Thu Jan 22 2015 at 3:32:53 PM Chris Sears <chris.sears at gmail.com> wrote: > The status quo is: > > a) 40b REX+BT instruction for the 64b case > b) 48b TEST for the 32b case > c) unless it's small TEST > > > You are currently paying a 16b penalty for TEST vs BT in the 32b case. > That may be worth testing the -Os flag. > You'll want -Oz here, Os isn't supposed to affect the runtime as much as this is going to. -eric -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm....

pre-RA scheduling/live register analysis optimization (handle move) forcing spill of registers

2018 Apr 23

pre-RA scheduling/live register analysis optimization (handle move) forcing spill of registers

...how to avoid this counterproductive optimization. TIA, Dominique Torette. # *** IR Dump After MachineFunction Printer ***: # Machine code for function addproddivConst: Post SSA Function Live Ins: %FA_ROFF1 in %vreg0 0B BB#0: derived from LLVM BB %entry Live Ins: %FA_ROFF1 16B %vreg0<def> = COPY %FA_ROFF1; FPUaOffsetClass:%vreg0 32B %vreg2<def> = MOVSUTO_A_iSLo 1077936128; FPUaOffsetClass:%vreg2 48B %vreg3<def> = FMUL_A_oo %vreg0, %vreg2, %RFLAGA<imp-def,dead>; FPUaROUTMULRegisterClass:%vreg3 FPUaOffsetClass:%vr...

Vector evolution?

2020 Sep 01

Vector evolution?

...rize \ -ffast-math -ffp-model=fast -ffp-exception-behavior=ignore -ffp-contract=fast -mrecip=all:0 \ -c -o vec.o vec.cc I get the following codegen: 0000000000000160 <_Z4fct6PDv4_f>: 160: 31 c0 xor %eax,%eax 162: c4 e2 79 18 05 00 00 vbroadcastss 0x0(%rip),%xmm0 # 16b <_Z4fct6PDv4_f+0xb> 169: 00 00 16b: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) 170: c5 f8 59 0c 07 vmulps (%rdi,%rax,1),%xmm0,%xmm1 175: c5 f8 29 0c 07 vmovaps %xmm1,(%rdi,%rax,1) 17a: c5 f8 59 4c 07 10 vmulps 0x10(%rdi,%rax,1),%xmm0,%xmm1 180: c5 f8 29 4c 07 10 v...

LiveInterval error with 2 dead defs

2019 Sep 09

LiveInterval error with 2 dead defs

...ify-misched foo.mir # Before machine scheduling. ********** INTERVALS ********** %0 [16r,16d:1)[32r,32d:0) 0 at 32r 1 at 16r weight:0.000000e+00 RegMasks: ********** MACHINEINSTRS ********** # Machine code for function multiple_connected_components_dead: NoPHIs, TracksLiveness 0B bb.0: 16B dead %0:vgpr_32 = V_MOV_B32_e32 0, implicit $exec 32B dead %0:vgpr_32 = V_MOV_B32_e32 1, implicit $exec # End machine code for function multiple_connected_components_dead. *** Bad machine code: Multiple connected components in live interval *** - function: multiple_connected_...

[LLVMdev] Scheduling question (memory dependency)

2012 Sep 20

[LLVMdev] Scheduling question (memory dependency)

...---------------------------------- ********** MACHINEINSTRS ********** # Machine code for function _Z5check3foos: Post SSA Frame Objects: fi#-1: size=2, align=2, fixed, at location [SP+50] Function Live Ins: %X3 in %vreg1, %X4 in %vreg2 0B BB#0: derived from LLVM BB %entry Live Ins: %X3 %X4 16B %vreg2<def> = COPY %X4; G8RC_with_sub_32:%vreg2 32B %vreg1<def> = COPY %X3; G8RC:%vreg1 48B STH8 %vreg1<kill>, 0, <fi#-1>; mem:ST2[FixedStack-1] G8RC:%vreg1 64B %vreg4<def> = LHA 0, <fi#-1>; mem:LD2[%0] GPRC:%vreg4 ... ------------------------...

[LLVMdev] Scheduling question (memory dependency)

2012 Sep 21

[LLVMdev] Scheduling question (memory dependency)

...------------------------------------------------------------------ One notable difference is the "!tbaa !0" decoration on the load. I don't know whether this helps or not. Later the lowered instructions look like: ------------------------------------------------------------------ 16B %vreg2<def> = COPY %X4; G8RC_with_sub_32:%vreg2 32B %vreg1<def> = COPY %X3; G8RC:%vreg1 48B STH8 %vreg1<kill>, 0, <fi#-1>; mem:ST2[FixedStack-1] G8RC:%vreg1 64B %vreg0<def> = LHZ 0, <fi#-1>; mem:LD2[%i11] GPRC:%vreg0 ... ----------------------...

[LLVMdev] Scheduling question (memory dependency)

2012 Sep 21

[LLVMdev] Scheduling question (memory dependency)

...-------------------------------- > > One notable difference is the "!tbaa !0" decoration on the load. I > don't know whether this helps or not. Later the lowered instructions > look like: > > ------------------------------------------------------------------ > 16B %vreg2<def> = COPY %X4; G8RC_with_sub_32:%vreg2 > 32B %vreg1<def> = COPY %X3; G8RC:%vreg1 > 48B STH8 %vreg1<kill>, 0, <fi#-1>; mem:ST2[FixedStack-1] > G8RC:%vreg1 > 64B %vreg0<def> = LHZ 0, <fi#-1>; mem:LD2[%i11] GPRC:%vreg0 > ....

registering icecast server in shoutcast directory?

2005 May 26

registering icecast server in shoutcast directory?

...nd would add a lot of functionality to the lists: http://www.htdig.org/ TIA -- Tom ====================================================================== "Z-80 system stack overflow. Shut 'er down Scotty, the system's sucking mud" - Error message on TRS 80 Model-16B Thomas D. Simes simestd@netexpress.com ======================================================================

[LLVMdev] X86TargetLowering::LowerToBT

2015 Jan 23

[LLVMdev] X86TargetLowering::LowerToBT

...22 2015 at 3:32:53 PM Chris Sears <chris.sears at gmail.com <mailto:chris.sears at gmail.com>> wrote: > The status quo is: > > a) 40b REX+BT instruction for the 64b case > b) 48b TEST for the 32b case > c) unless it's small TEST > > You are currently paying a 16b penalty for TEST vs BT in the 32b case. > That may be worth testing the -Os flag. > > You'll want -Oz here, Os isn't supposed to affect the runtime as much as this is going to. > > -eric > > _______________________________________________ > LLVM Developers mailin...

search for: 16b