similar to: [LLVMdev] Saving one part of a register pair in the callee-saved list.

Displaying 20 results from an estimated 10000 matches similar to: "[LLVMdev] Saving one part of a register pair in the callee-saved list."

2012 Jul 11
0
[LLVMdev] Saving one part of a register pair in the callee-saved list.
Hi Borja, On Jul 10, 2012, at 6:26 PM, Borja Ferrer wrote: > Hello, > > I would like to know if there's a way of setting the callee-saved register list inside getCalleeSavedRegs() to make the PEI pass save/restore only one half of a register pair if the other half is not being used, instead of saving the whole pair. Here is an example of what I try to explain to make things more
2014 Feb 08
3
[PATCH 1/2] arm: Use the UAL syntax for ldr<cc>h instructions
On Fri, 7 Feb 2014, Timothy B. Terriberry wrote: > Martin Storsjo wrote: >> This is required in order to build using the built-in assembler >> in clang. > > These patches break the gcc build (with "Error: bad instruction"). Ah, right, sorry about that. > Documentation I've seen is contradictory on which order ({cond}{size} or > {size}{cond}) is correct.
2017 Oct 09
4
{ARM} IfConversion does not detect BX instruction as a branch
Hi all, I got a silly bug when compiling our project with the latest Clang. Here's the outputted assembly: > tst r3, #255 > strbeq r6, [r7] > ldreq r6, [r4, r6, lsl #2] > strne r6, [r7, #4] > ldr r6, [r4, r6, lsl #2] > bx r6 For the code to execute correctly, either the _ldr_ should be a _ldrne_ instruction or the _ldreq_ instruction should be removed. The error seems to
2017 Oct 20
1
[PATCH v1 01/27] x86/crypto: Adapt assembly for PIE support
On 20 October 2017 at 09:24, Ingo Molnar <mingo at kernel.org> wrote: > > * Thomas Garnier <thgarnie at google.com> wrote: > >> Change the assembly code to use only relative references of symbols for the >> kernel to be PIE compatible. >> >> Position Independent Executable (PIE) support will allow to extended the >> KASLR randomization range below
2017 Oct 20
1
[PATCH v1 01/27] x86/crypto: Adapt assembly for PIE support
On 20 October 2017 at 09:24, Ingo Molnar <mingo at kernel.org> wrote: > > * Thomas Garnier <thgarnie at google.com> wrote: > >> Change the assembly code to use only relative references of symbols for the >> kernel to be PIE compatible. >> >> Position Independent Executable (PIE) support will allow to extended the >> KASLR randomization range below
2014 Oct 24
3
[LLVMdev] IndVar widening in IndVarSimplify causing performance regression on GPU programs
Hi, I noticed a significant performance regression (up to 40%) on some internal CUDA benchmarks (a reduced example presented below). The root cause of this regression seems that IndVarSimpilfy widens induction variables assuming arithmetics on wider integer types are as cheap as those on narrower ones. However, this assumption is wrong at least for the NVPTX64 target. Although the NVPTX64 target
2010 Sep 21
1
[LLVMdev] Possible missed optimization on function calling?
Hello, I noticed that the following code could be improved a little bit further. If the optimization is too tricky for the compiler or something and it's done this way by design forgive me, but in any case i just wanted to point it out. Consider the following C code: extern int mcos(int a); extern int msin(int a); extern int mdiv(int a, int b); int foo(int a, int b) { int a4 =
2017 Dec 01
2
Some strange i64 behavior with arm 32bit. (Raspberry Pi)
Hi Tim, thanks for the swift response! @debug is defined in the same module, which makes this all the more confusing. The target information from the working example are: target datalayout = "e-m:e-p:32:32-i64:64-v128:64:128-a:0:32-n32-S64" target triple = "armv6kz--linux-gnueabihf" from the ghc produced module: target datalayout =
2020 Apr 07
2
[ARM] Register pressure with -mthumb forces register reload before each call
If I'm understanding what's going on in this test correctly, what's happening is: * ARMTargetLowering::LowerCall prefers indirect calls when a function is called at least 3 times in minsize * In thumb 1 (without -fno-omit-frame-pointer) we have effectively only 3 callee-saved registers (r4-r6) * The function has three arguments, so those three plus the register we need to hold the
2016 Nov 17
2
Loop invariant not being optimized
I've got an example where I think that there should be some loop-invariant optimization happening, but it's not. Here's the C code: #define DIM 8 #define UNROLL_DIM DIM typedef double InArray[DIM][DIM]; __declspec(noalias) void f1( InArray c, const InArray a, const InArray b ) { #pragma clang loop unroll_count(UNROLL_DIM) for( int i=0;i<DIM;i++) #pragma clang loop
2016 Nov 18
2
Loop invariant not being optimized
I tried changing 'noalias' to 'restrict' in the code and I get: fma.c:17:12: warning: 'restrict' attribute only applies to return values that are pointers It seems like 'noalias' would be the correct attribute here, from the article you linked: "if a function is annotated as noalias, the optimizer can assume that, in addition to the parameters themselves,
2004 Oct 06
3
flac-1.1.1 completely broken on linux/ppc and on macosx if built with the standard toolchain (not xcode)
Sadly the latest optimization broke completely everything. The asm code isn't gas compliant. the libFLAC linker script has a typo, disabling the asm optimization and/or altivec won't let a correct build anyway. Instant fixes for the asm stuff: sed -i -e"s:;:\#:" on the lpc_asm.s to load address instead of addis+ori you could use lis and la and PLEASE use the @l(register)
2013 Oct 03
1
[LLVMdev] Help with a Microblaze code generation problem.
Sorry if this is a duplicate: I tried to send it last night and it didn't go through. I'm trimming some text to see if it helps. I have a simple program that fails on the Microblaze: int main() { unsigned long long x, y; x = 100; y = 0x8000000000000000ULL; return !(x > y); } As you can see, the test case compares two unsigned long long values. To try to track
2009 Feb 13
3
[LLVMdev] Modeling GPU vector registers, again (with my implementation)
It seems to me that LLVM sub-register is not for the following hardware architecture. All instructions of a hardware are vector instructions. All registers contains 4 32-bit FP sub-registers. They are called r0.x, r0.y, r0.z, r0.w. Most instructions write more than one elements in this way: mul r0.xyw, r1, r2 add r0.z, r3, r4 sub r5, r0, r1 Notice that the four elements of r0 are written
2011 Jun 03
2
modify a data frame by values in the columns
I have a data frame like this: col1 col2 r1 2 1 r2 4 3 r3 6 5 r4 8 7 r5 10 9 r6 12 11 r7 14 13 r8 16 15 r9 18 17 r10 20 19 I want to modify this data frame, for example, assign every row in column col1 and col2 to -1 if the values in col1 is less than 12 and values in col2 is greater than 10. The result should look like this: col1
2011 Feb 16
2
fwd: fix up ARM assembly to use 'bx lr' in place of 'mov pc, lr'.
hello vorlon, got notified of your patch, will apply next days upstream unless some critiques are voiced on ml. thanks. -- maks ----- Forwarded message from Steve Langasek <steve.langasek at canonical.com> ----- Date: Wed, 16 Feb 2011 22:05:42 -0000 From: Steve Langasek <steve.langasek at canonical.com> Subject: [Bug 527720] Re: thumb2 porting issues identified: klibc uses
2017 Oct 11
1
[PATCH v1 01/27] x86/crypto: Adapt assembly for PIE support
Change the assembly code to use only relative references of symbols for the kernel to be PIE compatible. Position Independent Executable (PIE) support will allow to extended the KASLR randomization range below the -2G memory limit. Signed-off-by: Thomas Garnier <thgarnie at google.com> --- arch/x86/crypto/aes-x86_64-asm_64.S | 45 ++++++++----- arch/x86/crypto/aesni-intel_asm.S
2011 Aug 13
1
Own R function doubt
Hi to all the people again, I was writting a simply function in R, and wish to collect the results in a excel file. The work goes as follows, Ciervos<-function(K1, K0, A, R,M,Pi,Hembras) {B<-(K1-K0)/A T1<-(R*Pi*Hembras-M*Pi+B)/(Pi-M*Pi+R*Pi*Hembras) P1<-Pi-B R1<-P1*Hembras*R M1<-P1*M T2<-(R1-M1+B)/(P1-M1+R1) P2<-P1-B R2<-P2*Hembras*R M2<-P2*M
2007 Nov 21
3
[LLVMdev] Add/sub with carry; widening multiply
I've been playing around with llvm lately and I was wondering something about the bitcode instructions for basic arithmetic. Is there any plan to provide instructions that perform widening multiply, or add with carry? It might be written as: mulw i32 %lhs %rhs -> i64 ; widening multiply addw i32 %lhs %rhs -> i33 ; widening add addc i32 %lhs, i32 %rhs, i1 %c -> i33 ; add with carry
2008 Aug 07
6
[LLVMdev] Ideas for representing vector gather/scatter and masks in LLVM IR
On Tuesday 05 August 2008 13:27, David Greene wrote: > Neither solution eliminates the need for instcombine to be careful and > consult masks from time to time. > > Perhaps I'm totally missing something. Concrete examples would be helpful. Ok, so I took my own advice and thought about CSE and instcombine a bit. I wrote the code by hand in a sort of pseudo-llvm language, so